Português LabEL® - Laboratório de Engenharia da Linguagem

Research Activities

LabEL is developing large-scale language resources for Portuguese, the LABEL-LEX® system, which consist of broad-coverage computational lexicons and grammars, formalized in finite-state transducers (FSTs). The former contain both simple and multiword lexical units; the later specify lexical and syntactic restrictions on word combinations. The choice for such research activities is motivated by the following general assumption: (i) industrially, NLP is currently booming: with the exponential growth of the Web, more texts are available online, and there is a real need for effective ways of searching, summarizing and translating them; (ii) high quality Language Resources are crucial to the effective processing of written texts, and are the basis to achieve high quality industrial applications.

Methods and techniques

Dictionaries and grammars are formalized and applied to text processing using finite-state techniques. Automata and finite-state transducers (FST) are broadly recognized to be particularly suitable for easy and compact representation of different types of data, including linguistic data. In NLP, they reduce space and time overhead in text processing operations.