Abstract:
The proliferation of hate speech on social media poses significant challenges to social cohesion
and stability, particularly in Ethiopia. This research investigates approaches to detecting and
classifying Amharic hate speech on Facebook using deep learning techniques. To address these
challenges, this study developed a multi-class hate speech detection system focusing on three
critical categories: ethnic, political, and religious hate speech. Using a comprehensive dataset
of 4,067 Facebook posts from pages with over 50,000 followers, the study manually
categorized content into three hate speech types: ethnic (1,497 posts), political (1,320 posts),
and religious (1,250 posts). The study employed two deep learning models, specifically
Convolutional Neural Networks (CNN) and Long Short-Term Memory (Bi-LSTM) networks,
to analyze and classify hate speech. The dataset underwent meticulous preprocessing through
tokenization, text cleaning, and normalization techniques to ensure data quality. This study
evaluates the effectiveness of Bi-LSTM and CNN deep learning models in classifying Amharic
hate speech into ethnic, political, and religious categories. The Bi-LSTM model outperformed
CNN, achieving a weighted average precision and recall of 0.83 and overall accuracy of 0.83,
compared to CNN's 0.80 across all metrics. While both models demonstrated strong
performance, Bi-LSTM showed superior capability in capturing contextual information and
maintaining consistent classification accuracy across categories. Moreover, the study
highlighted potential challenges in the practical implementation of deep learning-based hate
speech detection systems, such as managing code-switching, adapting to evolving language
patterns, and ensuring fairness and transparency. Therefore, the study recommended that
collaboration with various stakeholders is crucial for the successful implementation and
continuous improvement of the system. This includes working with social media platforms,
government agencies, and civil society organizations to integrate the models into content
moderation pipelines and policy enforcement frameworks.