High dimensional autonomous computing on Arabic language classification

Research Authors

George Samy Rady a, Sara Salah Mohamed b, Mamdouh Farouk Mohamed c, Khaled F. Hussain d

Research Date

Sun, 05/01/2022 - 12:00

Research Department

Computer Science Department

Research Journal

Computers and Electrical Engineering

Research Member

Mamdouh Farouk Mohamed Farghaly

Research Abstract

Hyper vectors are holographic and randomly processed with independent and identically distributed tools. A hyper vector includes whole data merged as well as spread completely on its pieces as an encompassing portrayal. So, no spot is more dependable to store any snippet of data compared to others. Hyper vectors are joined with tasks likened to expansion, and changed the structure of numerical processing on vector regions. Hyper vectors are intended to analyze the closeness utilizing a separation metric over the vector region. These activities are nothing but hyper vectors in which it can be joined into intriguing processing conduct with novel highlights which make them vigorous and proficient. This paper focuses on a utilization of hyper dimensional processing for distinguishing the language of text tests for encoding sequential letters into hyper vectors. Perceiving the language of a given book is the initial phase in all sorts of language handling. Examples: text examination, arrangement, and interpretation. High dimension vector models are mainstream in Natural Language Processing and are utilized to catch word significance from word insights. In this research work, the first task is high dimensional computing classification, based on Arabic datasets which contain three datasets such as Arabiya, Khaleej and Akhbarona. High dimensional computing is applied to obtain the results from the previous dataset when it is applied to N-gram encoding. When utilizing SANAD single-label Arabic news articles datasets with 12 N-gram encoding, the accuracy of high computing is 0.9665%. The high dimensional computing with 6 N-gram encoding while utilizing RTA dataset, provides the accuracy of 0.6648%. ANT dataset with 12 N-gram encoding in high dimensional computing gives the accuracy 0.9248%. The second task is applying high dimensional computing on Arabic language recognition for Levantine dialects three dataset is utilized. The first dataset is SDC Shami Dialects Corpus which contains Jordanian, Lebanese, Palestinian and Syrian. The same provides an accuracy of 0.8234% while it is applied to high dimensional computing with 7 N-gram encoding. PADIC (Parallel Arabic dialect corpus) is the second dataset which contains Syria and Palestine Arabic dialects that provide an accuracy of 0.7458% when applied high dimensional computing with 5 N-gram encoding. The high dimensional computing when applied to third dataset MADAR (Multi-Arabic dialect applications and resources) with 6 N-gram encoding provides the accuracy rate of 0.7800%.