مقالات​

Semantically Clustering of Persian Words

نویسندگان
Alireza Arasteh, Mohammad Hossein Elahimanesh, Ahmad Sharif, Behrouz Minaei-Bidgoli
چکیده
Clustering is one of data mining task which aims to divides a set of objects into groups so that similar objects fall into the same group and objects with different features are put into different and separate groups. This paper presents a technique for semantic word clustering which is one of the applications of data mining techniques in the task of natural language processing. Word clustering is used in various fields of text mining such as word disambiguation, information retrieval, language modelling, and text classification. This paper proposes a graph based method to clustering Persian words. The proposed method is a type of pattern-based clustering. This method includes two parts; in the first part using statistical similarity measures such as Chi-Square, pointwise mutual information (PMI), and Cosine a word co-occurrence graph is obtained. In the second part, the graph is further divided into appropriate clusters by Newman’s graph clustering algorithm. Our researches show that Chi-square is the best measure to cluster the words in Persian
کلیدواژه‌ها
Word Clustering; Text Mining; Persian NLP, Graph-base Clustering
0 0 رای ها
رأی دهی
اشتراک در
اطلاع از
guest
0 نظر
بازخورد (Feedback) های اینلاین
نمایش همه نظرات