An argumentation miningsystem

With the advance of the web 2.0 we have recognized the
value of opinionated content scattering around the social
media as a chance for studying the convergence of points of
view and conclusions in dispute. An argumentation mining
system has the potential to oer massive qualitative
analysis from such sources.
Argumentation mining is the process of identifying structured
argument from raw natural text therefore natural
language processing, argumentation theory and information
retrieval are involved. Argumentation theory is a
cognitive model based on the construction and evaluation
of information pieces called arguments. Argument is in
its essence an aggregation of a claim (also referred to as
conclusion) and a set of premises supporting it, these components
can be seen as elementary units of argumentation.
Given that claims themselves can be used as premises for
descending further claims, an argument can also be seen
as a nested structure (tree or graph) with qualitative relations
between its sub-components.
Argumentation is a multidisciplinary research eld that
deals with debate and reasoning processes having ties with
areas such as logic, law, language, rhetoric, philosophy,
psychology and computer science (Lippi and Torroni (2016)).
Argumentation has come to be progressively central as a
core study within articial intelligence (Bench-Capon and
Dunne (2007)), due to its ability to conjugate representational
needs with user-related representation models and
computational models for automated reasoning, the study
of argumentation in AI has given the rise of computational
argumentation.
?Fully documented templates are available in the elsarticle package
on CTAN.
Corresponding author
Email address: support@elsevier.com (Global Customer
Service)
URL: www.elsevier.com (Elsevier Inc)
1Since 1880.
Here, dierent argument representation models have been
developed along three dierent categories presented by
(Bentahar et al. (2010)): rhetorical, dialogical, and monological
models.
The two previous categories highlight argumentation as a
dynamic process: rhetorical models deal with arguments
which are based on the audiences perception and with evaluative
judgments rather than with establishing the truth of
a conclusion, whereas dialogical models describe the ways
arguments are connected in dialogical structures, sometimes
considered as abstract entities and ignoring their
internal structures: several dialogical models have been
proposed in the literature, for example (Bentahar et al.
(2004); Bentahar et al. (2003); Dung (1995)).
While dialogical models and rhetorical models of argumentation
highlight the process of argumentation in a dialogue
structure, monological models emphasize the internal
structure of the argument itself. What is important in
these models is not the link that can exist between arguments,
but the link between the dierent components of a
given argument.
We are interested in monological models, where we want
to extract a structured argument from opinionated content
spreading around the social media like Twitter and
hence help in dening a formal argued opinion. Several
monological models addressing the internal structure of
arguments have been developed, for example (Toulmin
(1958);
Freeman (1993); Reed and Walton (2003)). These
models stress the link between the dierent components of
an argument and how a conclusion is related to a set of
premises. Toulmin\’s model (Toulmin (1958)) distinguishes
the role of dierent types of premise, i.e., data, warrant,
and backing, in the argument, while premise-conclusion
models (Freeman (2011)) do not dierentiate types of different
premises, it enables the macro-structure of arguments
which species the dierent ways that premises and
conclusions combine to form larger complexes, they identi
ed four main macro-structures of arguments: linked,
Preprint submitted to Journal of LATEX Templates December 9, 2018
serial, convergent, and divergent, to represent whether different
premises contribute together, in sequence, or independently
to one or multiple conclusions.
The identication of argumentation structures includes several
subtasks like separating argumentative from non argumentative
text units (Moens et al. (2007); Goudas et al.
(2014); Palau and Moens (2009)), classifying argument
components into claims and premises (Mochales and Moens
(2011); Goudas et al. (2014) and Stab and Gurevych (2014)),
and identifying argumentative relations (Palau and Moens
(2009); Peldszus (2014); Stab and Gurevych (2014)).
However, an approach which covers all subtasks is still
missing. Furthermore, most approaches operate locally
and do not optimize the global argumentation structure
especially for content from social media like twitter.
Recently, Peldszus and Stede (2015) proposed an approach
based on minimum spanning trees (MST) which jointly
models argumentation structures for short texts. As well,
(Bosc et al. (2016b)) faced the issue of dealing with Twitter
data, dealing with textual arguments of length inferior
or equal to 140 characters, it was the only complete structure
of the arguments from Twitter content, also (Stab and
Gurevych (2017)) introduced a novel corpus of persuasive
essays annotated with argumentation structures.
Our primary motivation for this work is to create a
Twitter argumentation system using opinion mining in order
to achieve a better understanding of argumentation
structure from opinionated content. From our participation
in Clef2017 Microblog Cultural Contextualization
(Ouertatani et al.), we had access to the corpus from Twitter
surrounding cultural events such as festivals, music
and movies (Cappellato et al. (2017)). We aim in this
work to enhance the component of argument detection of
an argumentation mining system exploring the benets of
incorporating opinion mining and subjectivity detection
features/techniques using Clef2017 dataset. The contributions
of this article are the following:
Opinion-topic detection
Argued opinion defenition
An end-to-end argumentation structure parser
The remainder of this article is structured as follows: In
section 2, we review related work in computational argumentation.
In section 3…;
2. Related work
The growing excitement in computational argumentation
area is tangible. The initial studies started to appear
only a few years ago within specic genres such as legal
texts, online reviews and debate (Mochales and Moens
(2011); Cabrio and Villata (2012)). In 2014 alone there
have been at least three international events on argumentation
Mining: The First ACL Workshop on Argumentation
Mining 2, Workshop on Argument Mining: Perspectives
from Information Extraction, Information Retrieval
and Computational Linguistics3 and the BiCi Workshop
on Frontiers and Connections between Argumentation Theory
and Natural Language Processing, 4
While research on this topic is gaining visibility at major
articial intelligence and computational linguistics conferences,
and IBM has recently funded a multi-million cognitive
computing project whose core technology is argument
mining: IBM Debating Technologies 5.
Indeed, this is not only a scientically engaging problem,
but also one with self-evident application potential. The
Web and online social networks oer a real mine of information
through a variety of dierent sources. Currently,
the techniques used to extract information from
these sources are chie
y based on statistical and network
analysis, as in opinion mining (Pang et al. (2008)) and social
network analysis.
In the last years, there has been growing interest in assessing
meaning to streams of data from microblogging
services such as Twitter, as well as some recent research on
using argument mining for social media. To the best of our
knowledge, (Toni and Torroni (2011)) were the rst that
combined social media and argumentation in a unied approach.
In this novel view, argumentation frameworks are
obtained bottom-up starting from the users\’ comments,
opinions and suggested links.
Also (Grosse et al. (2012); Grosse et al. (2015)), presented
a novel approach which integrates argumentation theory
and microblogging technologies, with a particular focus
on Twitter, it has created a framework which allows opinion
mining from incrementally generated Twitter queries.
(Saint-Dizier (2012);Villalba and Saint-Dizier (2012)) presented
the TextCoop platform, which basically constructs
arguments from opinions and supportive elements such as
illustrations and evaluative expressions, by using a set of
handcrafted rules that explicitly describe rhetorical structures.
In (Bosc et al. (2016a)), They have proposed a
methodology to build a DART (Dataset of Arguments
and their Relations on Twitter) to detect tweet-arguments
from a stream of tweets, and to establish the relations between
them.
In Clef2018 6 edition the MC2 lab 7 mainly focused on processing
and developing methods and resources to mine a
corpus published in the previous edition Clef2017. One of
the main tasks ran was about Mining opinion argumenta-
2http://www.uncg.edu/cmp/ArgMining2014/,SICSA
3http://www.arg-tech.org/index.php/
sicsa-workshop-on-argument-mining-2014/
4http://wwwsop.inria.fr/members/Serena.Villata/BiCi2014/
frontiersARG-NLP.html.
5http://researcher.watson.ibm.com/researcher/viewgroup.
php?id=5443
6http://clef2018.clef-initiative.eu/
7https://mc2.talne.eu/
2
tion, it aimed to automatically identify reason-conclusion
structures that can lead to model social web users positions
about a cultural event expressed via Twitter microblogs.
The idea was to perform a search process on a massive
microblog collection that focuses on claims about a given
festival. The identication of argumentation structures
addresses a variety of dierent tasks. Most relevant to our
work are approaches on argument mining that focus on
the identication of argumentation structures in natural
language texts and opinionated content. We categorize related
approaches into the following three subtasks:
Component identication emphases on the separation
of argumentative from non-argumentative text
units and the identication of argument component
boundaries.
Component classication addresses the function of
argument components. It aims at classifying argument
components into dierent types such as claims
and premises.
Structure identication emphases on linking arguments
or argument components. Its objective is to
recognize dierent types of argumentative relations
such as support or attack relations.
2.1. Component identication
The goal of an argumentation mining systems rst stage
is to identify argument components within the input text
and that can therefore be dened as argumentative. The
problem can be formulated as a classication task, which
could in principle be addressed by a machine learning classi
er to identify argumentative from non-argumentative
sentences, leaving the task of identifying the type of argument
component (e.g., a claim or a premise) to a second
stage. The existing systems have used a wide variety
of classic machine learning algorithms, including Support
Vector Machines (SVM), Logistic Regression, Nave
Bayes classiers, Maximum Entropy classiers, Decision
Trees and Random Forests (Mochales and Moens (2011);
Park and Cardie (2014); Goudas et al. (2014); Rinott et al.
(2015)).
These classiers are trained in a supervised setting,
thus on a collection of labeled examples. For each example,
some representation of the text to be classied is given
(e.g., in the form of a feature vector ), together with the
associated class (label ). The training phase produces a
model that can then be used to perform predictions on
new test set. Although several works in the literature
have tried to compare some of these approaches, there is
no clear evidence to tell which classier should be preferred.
In almost all the existing systems, in fact, most of
the eort has been put into conceiving sophisticated and
highly informative features, that have a clear and immediate
impact on the performance, rather than into constructing
appropriate models and algorithms for the considered
problem. Thus, the approaches that have been proposed
so far typically rely on simple and fast classiers. Many
works employ classical features for text representation, including
bag-of-words representations of sentences, word bigrams
and trigrams, part-of-speech information obtained
with some statistical parser, information on punctuation,
verb tenses and the use of some pre-determined list of
key phrases (Palau and Moens (2009); Stab and Gurevych
(2014)). Even more sophisticated features include sentiment
analysis indicators, subjectivity scores of sentences,
dictionaries of keywords or key phrases highly informative
of the presence of an argument.