Sunday 31 August 2014

Probabilistic methods in computational linguistics

Computational linguistics : Probabilistic methods

Probabilistic methods in linguistics
Computational linguistics is a field that aims to develop techniques for processing human languages through automatic means. Since the introduction of electronic computers in the late 1940s, researchers have been interested in machine translation, which was one of the earliest topics to attract attention. The development of computers was inspired by the idea of creating a thinking machine, or machina sapiens, and language was considered a uniquely human cognitive ability. Early work on artificial intelligence pitted symbolic reasoning against stochastic systems like neural nets. However, it soon became apparent that a solid probabilistic foundation was necessary to deal with uncertainty.

In computational linguistics, the belief in the adequacy of grammatical and logical constraints, supplemented by ad hoc heuristics, persisted for a long time. However, when the field acknowledged the importance of probabilistic methods, the shift was rapid and significant. The emergence of statistical part-of-speech tagging in 1988, which was described in papers by Church and DeRose, can be seen as the beginning of this awareness.

Prior to the papers by Church and DeRose, stochastic methods for part of speech disambiguation had been proposed, but they had not gained much prominence in computational linguistics. However, the papers by Church and DeRose had a profound impact on the field and reshaped it within a decade. At the time, one of the major challenges in natural language processing was the fragility of manually constructed systems. Ambiguity resolution, portability, and robustness were the primary issues. Semantic constraints were often too loose, leading to numerous viable analyses, or too strict, ruling out the correct analysis. Hence, there was a need for automatic methods that could soften constraints and resolve ambiguities. Portability required the adaptation of systems to variability across application domains, and robustness demanded the handling of errorful input and incomplete grammars. These challenges necessitated the use of automatic learning methods, which explains why probabilistic methods and machine learning, in particular, gained rapid penetration. Nowadays, computational linguistics is inseparable from machine learning.

No comments:

Post a Comment