A MACHINE LEARNING APPROACH TO MULTI SCALE SENTIMENT ANALYSIS OF TIGRIGNA  ONLINE POSTS

Samuel, Hagazi; Assabie, (Ph.D.) Yaregal; Gebremariam, (MSc) Akubazgi

dc.contributor.author	Samuel, Hagazi
dc.contributor.author	Assabie, (Ph.D.) Yaregal
dc.contributor.author	Gebremariam, (MSc) Akubazgi
dc.date.accessioned	2022-02-16T06:38:04Z
dc.date.available	2022-02-16T06:38:04Z
dc.date.issued	2021-09
dc.identifier.uri	http://ir.haramaya.edu.et//hru/handle/123456789/4761
dc.description	114p.	en_US
dc.description.abstract	With the rapid growth of web technologies, individuals and organizations are increasingly using public opinions in blogs, forums, review sites, social networks, etc. for expressing their views and opinions. These reviews are very useful for service providers, manufactures and organizations in making informed decisions and improving their service. However, the huge volume of reviews on the social media grows so rapidly and becoming increasingly difficult for users to analyze and extract relevant information. Therefore, an automated sentiment analysis is needed. In this research, we presented a multiscale sentence-level sentiment analysis for Tigrigna online posts using a supervised machine learning approach. The multiscale Tigrigna sentiment analysis model classifies a given sentence into five predefined classes: very positive (2), positive (1), neutral (0), negative (-1) and very negative (-2). We have used three supervised machine-learning algorithms: Naïve Bayes (NB), Maximum Entropy (MaxEnt) and Support Vector Machine (SVM) with unigram, bigram, trigram and hybrid of unigram and bigram variants of N-gram as a feature. The proposed model contains different components like preprocessing (tokenization, normalization, stop word removal), morphological analysis (lemmatizing), feature extraction, training a machine learning algorithms, classification and evaluation of the result using evaluation metrics. For conducting the experiments, 1500 Tigrigna sentences are collected from different sources. Due to the morphological complexity of the language, preprocessing techniques have been applied in order to clean noisy data and reduce sparseness and dimensionality of the dataset. After preprocessing, the dataset is lemmatized, before it is given to training phase of the experiment. The experimental results show the SVM algorithm with unigram language model outperforms all algorithms with 71% accuracy. In conclusion, despite the language morphological complexity and lack of effective morphological analysis tools, the achieved experimental results are promising. However, we are convinced that the results could improve further with a larger, pre annotated and cleaned corpus.	en_US
dc.description.sponsorship	Haramaya University	en_US
dc.language.iso	en	en_US
dc.publisher	Haramaya university	en_US
dc.subject	Tigrigna Language; N-gram model; Multi-scale Sentiment Analysis; Maximum Entropy; Support Vector Machine; Naive Bayes	en_US
dc.title	A MACHINE LEARNING APPROACH TO MULTI SCALE SENTIMENT ANALYSIS OF TIGRIGNA ONLINE POSTS	en_US
dc.type	Thesis	en_US