Evaluating Textual Features and Oversampling for Automatic Stance Detection

by. Jongwon Lee | 190 Views (132 Uniq Views) | almost 3 years ago
#NLP #Linguistics #TermPaper
Term Paper for Computation & Linguistic Analysis LING-L545, Indiana University
PDF

We describe a series of experiments focused on a number of basic textual features and their effectiveness at the task of automatic stance detection. Specifically, we evaluate the impact of bag-of-words (BoW) features, sentiment lexicon features, and syntactic features on the performance of a Support Vector Machine (SVM). Based on our analysis, we find that the words in a tweet offer the most insight into the stance and that adding features from sentiment lexicons can improve the performance. Additionally, we find that one target showed a performance increase when adding syntactic dependency features. In addition, we identify challenges related to class imbalance, generally small data volume, and data quality.

Keywords: 
Stance Detection, Sentiment Analysis, Social Media, Support Vector Machine, Subjectivity and Arguing Lexicon, Synthetic Minority Oversampling, Term Frequency-Inverse Document Frequency