The Rosetta Project’s goal is to create a publicly accessible digital library of human languages, and to ensure very long term preservation of these digital materials. Rosetta’s collections of texts, sound recordings and video recordings include data on over 2500 languages. The collection is housed at the Internet Archive but many of Rosetta’s materials can be located and accessed through Langscape.
SeedLing is a publicly available machine-readable cross-linguistic corpus for computational linguistic research. SeedLing began as a student project at the University of the Saarland, and currently includes data from: