Language Identification Tool

You can use this tool to try and identify the language of a piece of text. The tool works by comparing your text to a database of language samples.

Type or paste your text in the left box below. We recommend a minimum length of 100 characters, but the algorithm works best if you provide 200 characters or more.

Press the SUBMIT button. The results will be displayed below the text, with the top scoring language listed first. For comparison, a sample of that language will be displayed in the box to the right of your text. If you click on a lower-scoring language, a sample of that language will be displayed instead.

When examining the scores, the difference in score between languages is a guide to the certainty of the match. For example, if the top score is twice as high as the second score, the first language identified is probably correct. If the scores for the suggested languages are similar, this may be because the languages are closely related, but more commonly suggests that the algorithm did not find a good match.

Currently, the algorithm can recognize over 900 languages, but that depends in part on how the text is encoded. Some (but not all) languages which are typically written with non-Roman alphabets may only be recognized when transliterated. We hope to continue to increase the number of languages that the algorithm can correctly identify.

Text supporting our decision will appear here.

Language Search Results