Director : Professor André WLODARCZYK - Home page
Preliminary Discussion Forum for the 3rd MIC Sorbonne 2012The Methods of Interactive Linguistics are aimed at describing a number of semantic fields of natural languages, using data mining techniques elaborated within the framework of the new paradigm of computation known as Knowledge Discovery in Databases (KDD). The motivation is to "dig deeper" in order to find building blocks which could be used in various sophisticated ways. Interactive Linguistics is interdisciplinary, involving the scientific cooperation of linguists, logicians, psycho-neurologists and information engineers. The task of linguists is the interactive (computer-aided) discovery of ontology-based definitions of feature structures using the SEMANA software suite, which was designed especially for building datasets with linguistic knowledge.
The 3rd MIC Sorbonne 2012 Workshop "New Standards for Languages Studies"
It is important to build and logically reconstruct (enhance, integrate and formalize) theories of language in order to conceive meta-theoretical foundations which are necessary for making further progress in describing human languages.
Research in the area of model-based computing has already begun to bridge the gap between Linguistic Science and Formal Method. It is reasonable to expect that ontological inquiry into the interactively formalized description of natural language structures using the KDD techniques will make it possible to conceive more powerful description devices. Only advanced computational tools such as the SEMANA software suite can make the discovery process of linguistic knowledge more systematic (adequate and consistent).
CELTA (Sorbonne - Paris 4) proposes the SEMANA platform which integrates a dynamic db builder with the most powerful functionalities of symbolic data mining. It contains the following:
(1) DB Builder : database construction environment with facilities for the dynamic restructuring of data
• Records Editor with fields for samples of signs (expressions) and a space for attributes whose number may vary from record to record; among other tools, the Editor of Records includes a synthetic view of descriptions
• Tree Builder Assistant : provides assistance for building and drawing hierarchical (tree-like) structures which represent the “semantic feature structures” of data; all changes made on the tree may be passed on to the database on demand
• Attributes Editor : an autonomous module which enables changes to be made to the set of description attributes and automatically updates the whole database whenever necessary
(2) SEMANA's Editor : This is the monitor of SEMANA software suite in which it is possible to open, create, edit a file as well as to discover similarities and dependencies in datasets.
a) Symbolical Data Analysers
• Formal Concept Analyser (FCA) : a technique based on Lattice theory (Wille R.); various functions for the analysis and processing of "Formal Concept Contexts" (single-valued tables)
• Rough Set Analyser (RSA) : a technique using approximation membership functions (Pawlak Z.); various functions for the analysis and processing of "information systems" (multi-valued tables)
• Rough Formal Concept Analyser (RFCA) : a combination of FCA with RSA; functions especially useful for searching similarities in formal concept contexts
• Rough Decision Logic Analyser (RDLA) : a combination of (a) Rough Set Analysis (RSA) and (b) Decision Logic Analysis (DLA) – a technique originating in Expert Systems technology; it can be viewed as a rough rule builder
b) Statistical Data Analysers STAT 3
The most powerful technique offered by SEMANA is Factor Correspondence Analysis (FCA) coupled with Hierarchical Ascending Classification (HAC) according to programs written by J.-P. Benzécri and co-workers in the 1970s.
• Factor Correspondence Analysis (FCA)
• Hierarchical Ascending Classification (HAC)
• Multi-valued tables containing symbolic values are converted into one-valued tables called contingency tables.
In turn, one-valued tables may be converted into Burt's tables (tables of co-occurrences). These tables are particularly useful to study the dependence and clustering of attributes.
A Free Licence for Semana is available for registered users only. If you wish to apply for your free licence of Semana, please download the registration form, fill it in and send it by e-mail to: firstname.lastname@example.org
SEMANA Software User's Quick Guide (in English: [download]), (in Polish: [download])
Download your Registration Application (researchers in linguistics only)
Registered users only - Accès réservé aux utilisateurs autorisés
Please bear in mind, however, that SEMANA is not a commercial software. It needs to be continuously improved. Consequently, the proposed method necessitates an interdisciplinary cooperation between linguists and computer scientists. But essentially, the philosophy here is : "bring computer technology to linguists rather than the other way around".
Semana Software Suite summary
Other interactive linguistics tools:
Interactive linguistics tool detects political 'framing'