Application of Sentiment Analysis to
Emotion vocabulary has been studied in various disciplines, such as psychology, linguistics,
and computational linguistics. Recently, it plays a requisite role in sentiment analysis or opinion mining.
However, emotion vocabulary has not received considerable attention in second or foreign language
learning. The insuficient pedagogical materials and ineficient tool support seem to provide little help
for learners to master emotion words. The current study considers the application of sentiment analysis
to language learning. To achieve this goal, we developed RESOLVE, a context-aware emotion synonym
suggestion system, for educational purposes. Utilizing machine-learning techniques, the system is capable of
suggesting synonymous emotion words appropriate to learners' contexts. Importantly, the usage information
of each emotion word, including scenario descriptions, definitions, and example sentences, is provided in
order to help develop language learners' vocabulary knowledge as well as help facilitate their word use.
A pedagogical evaluation of the system's effectiveness was conducted using a writing task and a survey
questionnaire. The results indicate that the participants achieved substantial progress on emotion word
use with the help of the proposed system. In particular, less proficient participants demonstrated greater
improvements. Meanwhile, participants showed positive attitudes toward the tool support, as it helps them
to have a better command of emotion words in their writings.
Convolutional Recurrent Deep Learning Model for Sentence Classification.
As the amount of unstructured text data that humanity produces overall and on the internet grows, so does the need to intelligently to process it and extract different types of knowledge from it. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been applied to Natural Language Processing (NLP) systems with comparative, remarkable results. CNN is a noble approach to extract higher-level features that are invariant to local translation. However, it requires stacking multiple convolutional layers in order to capture long-term dependencies, due to the locality of the convolutional and pooling layers. In this article, we describe a joint CNN and RNN framework to overcome this problem. Briefly, we use an unsupervised neural language model to train initial word embeddings that are further tuned by our deep learning network, then the pre-trained parameters of the network are used to initialize the model. At a final stage, the proposed framework combines former information with a set of feature maps learned by a convolutional layer with long-term dependencies learned via Long-Short-Term Memory (LSTM). Empirically, we show that our approach, with slight hyperparameter tuning and static vectors, achieves outstanding results on multiple sentiment analysis benchmarks. Our approach outperforms several existing approaches in term of accuracy; our results are also competitive with the state-of-the-art results on the Stanford Large Movie Review (IMDB) dataset with 93.3% accuracy, and the Stanford Sentiment Treebank (SSTb) dataset with 48.8% fine-grained and 89.2% binary accuracy, respectively. Our approach has a significant role in reducing the number of parameters and constructing the convolutional layer followed by the recurrent layer as a substitute for the pooling layer. Our results show that we were able to reduce the loss of detailed, local information and capture long-term dependencies with an efficient framework that has fewer parameters and a high level of performance.
Rich Short Text Conversation Using Semantic Key Controlled Sequence Generation.
With the recent advances of sequence-to-sequence
framework, generation approaches for short text conversation
(STC) become attractive. Traditional sequence-to-sequence approaches
for short text conversation often suffer from poor
diversity and general reply without substantiality. It is also hard
to control the topic or semantics of the selected reply from
multiple generated candidates. In this paper, a novel external
memory driven sequence-to-sequence learning approach is proposed
to address these problems. A tensor of external memory is
constructed to represent interpretable topics or semantics. During
generation, a controllable memory trigger is extracted given the
input sequence, and a reply is then generated using the memory
trigger as well as the sequence-to-sequence model. Experiments
show that the proposed approach can generate much richer
diversity than traditional sequence-to-sequence training with
attention. Meanwhile, it achieves better quality score in human
evaluation. It is also observed that by manually manipulating the
memory trigger, it is possible to interpretably guide the topics
or semantics of the reply.
Context-Aware Answer Sentence Selection With Hierarchical Gated Recurrent Neural Networks.
In this paper, we study the task of reading comprehension
style answer sentence selection that aims to select the best
sentence from a given passage to answer a question. Unlike most
previous works that match the question and each candidate sentence
separately, we observe that the context information among
sentences in the same passage plays a vital role in this task. We
propose modeling context information with hierarchical gated recurrent
neural networks. Specifically, we first apply a word level
recurrent neural network to model the context independent matching
between the question and each candidate sentence. We then
employ a sentence level recurrent neural network to incorporate
the context information among all candidate sentences. Moreover,
we introduce the gate mechanism to select matching information
before feeding into recurrent neural networks at both word and
sentence level. Experiments on the WikiQA and SQuAD datasets
show that our model outperforms state-of-the-art methods.
A Sentiment Analysis
System to Improve Teaching and Learning.
Sentiment analysis (SA) is the process of identifying
and classifying users’ opinions from
a piece of text into different sentiments—for
example, positive, negative, or neutral—or
emotions such as happy, sad, angry, or disgusted to
determine the user’s attitude toward a particular subject
or entity. SA plays an important role in many fields
including education, where student feedback is essential
to assess the effectiveness of learning technologies.
Many universities obtain such feedback via a student
response system (SRS) during or at the end of a course to
analyze the teacher’s performance.1 Student feedback
about teacher performance, the learning experience for university administrators and instructors but also plays a key role
in influencing student decisions on which universities to attend or
courses to take.
Using Natural Language Processing to Automatically Detect Self-Admitted Technical Debt.
The metaphor of technical debt was introduced to express the trade off between productivity and quality, i.e., when
developers take shortcuts or perform quick hacks. More recently, our work has shown that it is possible to detect technical debt using
source code comments (i.e., self-admitted technical debt), and that the most common types of self-admitted technical debt are design
and requirement debt. However, all approaches thus far heavily depend on the manual classification of source code comments. In this
paper, we present an approach to automatically identify design and requirement self-admitted technical debt using Natural Language
Processing (NLP). We study 10 open source projects: Ant, ArgoUML, Columba, EMF, Hibernate, JEdit, JFreeChart, JMeter, JRuby and
SQuirrel SQL and find that 1) we are able to accurately identify self-admitted technical debt, significantly outperforming the current
state-of-the-art based on fixed keywords and phrases; 2) words related to sloppy code or mediocre source code quality are the best
indicators of design debt, whereas words related to the need to complete a partially implemented requirement in the future are the best
indicators of requirement debt; and 3) we can achieve 90% of the best classification performance, using as little as 23% of the
comments for both design and requirement self-admitted technical debt, and 80% of the best performance, using as little as 9% and
5% of the comments for design and requirement self-admitted technical debt, respectively. The last finding shows that the proposed
approach can achieve a good accuracy even with a relatively small training dataset.