Definition of 4 specialized terms (Q.3) 3rd theme

26 05 2008

In the following article, we are going to define four specialized terms:

  • Machine translation.
  • Machine aided translation.
  • Multilingual content management.
  • Translation technology.

The first one is Machine Translation:  Sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies. 

The second term is Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process.

Besides we have the term of Multilingual content management that concerns the administration of multilingual websites. According to Danny Stofer it involves the following issues:

  • Translation
  • Localization
  • Culture
  • Feedback
  • Design
  • Workflow
  • Non-Latin character sets

Finally, we have the last one, Translation Technology is the technology in which the main action is the interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the source text, and the language it is to be translated into is called the target language; the final product is sometimes called the “target text”. 

SOURCES:





Characteristics of a translation task (FEMTI report) (Q.3) 1rst theme

4 05 2008

According to the FEMTI or Framework for the Evaluation of Machine Translation, the three main features of a translation task are: Assimilation, Dissemination and Communication. In the following lines I’m going to give a brief explanation about each of them.

  • Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.
  • Dissemination: The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.
  • Communication: The ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage. The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.

SOURCES:

* FEMTI – a Framework for the Evaluation of Machine Translation in ISLE, May 2, 16:23 from, http://www.issco.unige.ch:8080/cocoon/femti/st-home.html 





Explanation of three Research Topics (Q.2) 2nd theme

22 04 2008

In this article, I am going to explain three of different research topics that I have chosen between the others.

The first topic I’m going to talk about is “NECA” or “Net Environment for Embodied Emotional Conversational Agents” one of previous projects of the Austrian Research Institute for Artificial Intelligence ÖFAI.

In the NECA project, the focus is on the design of credible agent-agent interaction patterns to be observed by human users. To achieve a high level of credibility, the agents must be able to express themselves using a combination of verbal and non-verbal output driven by personality and emotion models.

Moreover, the NECA project will develop a new generation of mixed multi-user / multi-agent virtual spaces populated by affective conversational agents. The agents will be able to express themselves through synchronised emotional speech and non-verbal behaviour, generated from an abstract representation which can be the output of an affective reasoner. This is the first time that such expressive capabilities are featured in Internet applications. The agents’ usefulness will be evaluated in two concrete application scenarios. From a technical point of view, the emerging NECA platform will provide a confederation of dedicated components including an affective reasoner, co-ordinated generation, and emotional speech synthesis, thus providing a basis for the development of new Internet applications with emotional agents.

The next research topic I am going to explain is “HUMAINE” or “Human-machine Interaction Network on Emotions”.

HUMAINE aims to lay the foundations for European development of systems that can register, model and/or influence human emotional and emotion-related states and processes – ‘emotion-oriented systems’. Such systems may be central to future interfaces, but their conceptual underpinnings are not sufficiently advanced to be sure of their real potential or the best way to develop them.

In addition, one of the reasons is that relevant knowledge is dispersed across many disciplines. HUMAINE brings together leading experts from the key disciplines in a programme designed to achieve intellectual integration. It identifies six thematic areas that cut across traditional groupings and offer a framework for an appropriate division of labour – theory of emotion; signal/sign interfaces; the structure of emotionally coloured interactions; emotion in cognition and action; emotion in communication and persuasion; and usability of emotion-oriented systems. Teams linked to each area will run a workshop in it and carry out joint research to define an exemplar embodying guiding principles for future work in their area.

Finally, the last research topic which I will focus on is Corpus Linguistics:

Corpus Linguistics is the study of linguistic phenomena through large collections of machine-readable texts: corpora. These are used within a number of research areas going from the Descriptive Study of the Syntax of a Language to Prosody or Language Learning, to mention but a few. An over-view of some of the areas where corpora have been used can be found on the Research areas page.

Furthermore, the use of real examples of texts in the study of language is not a new issue in the history of linguistics. However, Corpus Linguistics has developed considerably in the last decades due to the great possibilities offered by the processing of natural language with computers. The availability of computers and machine-readable text has made it possible to get data quickly and easily and also to have this data presented in a format suitable for analysis.

REFERENCES:

* “NECA” or “Net Environment for Embodied Emotional Conversational Agents”. Retrieved, 18:34, 21th April 2008 from, http://www.dfki.de/pas/f2w.cgi?ltc/neca-e

* “NECA” or “Net Environment for Embodied Emotional Conversational Agents”. Retrieved, 19:02, 21th April 2008 from, http://www.ofai.at/~brigitte.krenn/papers/web3d_krenn_paper.pdf

* “HUMAINE” or “Human-machine Interaction Network on Emotions”. Retrieved, 17: 14, 18th April 2008 from, http://www.dfki.de/pas/f2w.cgi?ltp/humaine-e

* Corpus Linguistics. Retrieved, 17:22, 18th April 2008 from, http://www.essex.ac.uk/linguistics/clmt/w3c/corpus_ling/content/introduction3.html





Research Topics (Q.2) 1st theme

16 04 2008

In this article I will point out some research topics that are mentioned on different sites of Human Language Technologies.

Firstly, members of The Stanford NLP Group pursue research in a broad variety of topics:

  • Computational Semantics.
  • Parsing & Tagging.
  • Multilingual NLP.
  • Unsupervised Induction of Linguistic Structure.

Secondly, in Edinburgh Language Technology Group of Scotland, UK we can mention some of their projects which conducts research and development in a number of areas.

  • Combining Shallow Semantics and Domain Knowledge.
  • Text Mining fot Biomedical Content Curation.
  • Cross-retail Multi-agent Retail Comparison.
  • Smart Qualitative Data: Methods and Community Tools for Data Mark-Up.
  • Machine Learning for Named Entity Recognition.
  • Named entity tagging of historical parliamentary proceedings.
  • Integrated Models and Tolls for Fine-Grained Prosody in Discourse.
  • Joint Action Science and Technology.
  • AMI consortium projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location.
  • Study of how pairs collaborate when in planning a route on a map.

Finally, we can mention the German Language Technology Lab, which themes are elaborated in research, development and commercial projects:

  • Exploiting – and automatically extending – ontologies for content processing.
  • Tighter integration of shallow and deep techniques in processing.
  • Enriching deep processing with statistical methods.
  • Combining language checking with structuring tools in document authoring.
  • Document indexing for German and English.
  • Automatically associating recognized information with related information and thus building up collective knowledge.
  • Automatically structuring and visualizing extracted information.
  • Processing information encoded in multiple languages, among them Chinese and Japanese.

REFERENCES:





An European research centre for Human Language Technologies (Q1) 3rd theme

2 04 2008

One of the most important European research centre of Human Language Technologies is Language Technology Lab (DFKI) in Germany. It is very famous because of it gets the main budget of the public.

Their mission is the improvement of language technology through novel computational techniques for processing text, speech and knowledge, a deeper understanding of human language and thought, studying the true needs of the end user and the demands of the market. Moreover, they develop novel and improved applications in three areas: Information and Knowledge Management. Document Production, Natural Communication.

These themes are elaborated in research, development and commercial projects:

  • exploiting – and automatically extending – ontologies for content processing
  • tighter integration of shallow and deep techniques in processing
  • enriching deep processing with statistical methods
  • combining language checking with structuring tools in document authoring
  • document indexing for German and English
  • automatically associating recognized information with related information and thus building up collective knowledge
  • automatically structuring and visualizing extracted information
  • processing information encoded in multiple languages, among them Chinese and Japanese


Eventually, it is important to added the last DFKI LT publications that are two:

* Thierry Declerck, Hans-Ulrich Krieger, Marcus Spies, Horacio Saggion

Human Language and Semantic Web Technologies for Business Intelligence Applications

* Hans-Ulrich Krieger, Bernd Kiefer, Thierry Declerck
A Framework for Temporal Representation and Reasoning in Business Intelligence Applications

References:

* Language Technology Lab (2007). DFKI LT: About. Retrieved 17:23, April 1st 2008, from http://www.dfki.de/lt/index.php

* Language Technology Lab (2007). DFKI LT: Projects. Retrieved 17:40, April 1st 2008, from http://www.dfki.de/lt/projects.php

* Language Technology Lab (2007). DFKI LT: Publications. Retrieved 17:47, April 1st 2008, from http://www.dfki.de/lt/publications.php





Some Definitions for Human Language Technology (Q1) 1st theme

1 04 2008

Nowadays, The Human Language Technology is extending more and more all over the world and we can find some definitions on different pages that explain us what is.

Firstly, we have the definition that Hans Uszkoreit give us in the page of DFKI, Language technology (sometimes also referred to as human language technology) comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics.”

Secondly, according to wikipedia, Language technology is often called Human Language Technology (HLT) or natural language processing (NLP) and consists of computational linguistics (or CL) and speech technology as its core but includes also many application oriented aspects of them. Language technology is closely connected to computer science and general linguistics.

Lastly, to finished we can mention also the brief definition that Meraka Institute gives us:

“Human Language Technology (HLT) makes it easier for people to interact with machines. This can benefit a wide range of people – from illiterate farmers in remote villages who want to obtain relevant medical information over a cellphone, to scientists in state-of-the-art laboratories who want to focus on problem-solving with computers.”

References:

* Language Technology Lab. Hans Uszkoreit (2007). DFKI-LT – What is Language Technology?. Retrieved 12:50, March 1st 2008, from http://www.dfki.de/lt/lt-general.php

* Meraka Institute. African Advanced Institute for Information & Communication Technology (2007). Retrieved 13:20, March 27 2008, from http://www.meraka.org.za/humanLanguage.htm

* Language Technology (2007). In the wikipedia, The Free Encyclopedia. Retrieved 16:50, March 17th 2008, from http://en.wikipedia.org/wiki/Human_language_technology

* Hans Uszkoreit (2007). Retrieved 17:10, March 17th 2008, from http://www.coli.uni-saarland.de/~hansu/

* Natural Language Processing (2007). Retrieved 17:20, March 17th 2008, from http://en.wikipedia.org/wiki/Natural_language_processing