مقالات
An Introduction to Noor Diacritized Corpus
نویسندگان
Akbar Dastani, Behrouz Minaei-Bidgoli, Mohammad Reza Vafaei, Hossein Juzi
چکیده
This article is aimed to introduce Noor Diacritized Corpus which includes 28 million words extracted from about 360 hadith books. Despite lots of attempts to diacritize the holy Quran, little diacritizing efforts have been done about hadith texts. This corpus is therefore from a great significance. Different statistical aspects of the corpus are explained in this article. This paper states challenges of diacritizing activities in Arabic language in addition to general specifications of the corpus.
کلیدواژهها
Noor Diacritized Corpus, diacritization, Arabic corpora