Coptic is the latest stage of the Egyptian language, a northern Afroasiatic language spoken in Egypt until at least the 17th century. Coptic flourished as a literary language from the second to thirteenth centuries, and its Bohairic dialect continues to be the liturgical language of the Coptic Orthodox Church of Alexandria. (Source: Wikipedia)
CorpusImporter() or browse the CLTK GitHub organization (anything beginning with
coptic_) to discover available Coptic corpora.
In : from cltk.corpus.utils.importer import CorpusImporter In : c = CorpusImporter('coptic') In : c.list_corpora Out: ['coptic_text_scriptorium']
The corpus module has a class for generating a Swadesh list for Coptic.
In: from cltk.corpus.swadesh import Swadesh In: swadesh = Swadesh('cop') In: swadesh.words()[:10] Out: ['ⲁⲛⲟⲕ', 'ⲛⲧⲟⲕ, ⲛⲧⲟ', 'ⲛⲧⲟϥ, ⲛⲧⲟⲥ', 'ⲁⲛⲟⲛ', 'ⲛⲧⲟⲧⲛ', 'ⲛⲧⲟⲩ', '-ⲉⲓ', 'ⲡⲓ-, ϯ-, ⲛⲓ-', 'ⲡⲉⲓⲙⲁ', 'ⲙⲙⲁⲩ']