Definition of 4 specialized terms (Q.3) 3rd theme

26 05 2008

In the following article, we are going to define four specialized terms:

  • Machine translation.
  • Machine aided translation.
  • Multilingual content management.
  • Translation technology.

The first one is Machine Translation:  Sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies. 

The second term is Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process.

Besides we have the term of Multilingual content management that concerns the administration of multilingual websites. According to Danny Stofer it involves the following issues:

  • Translation
  • Localization
  • Culture
  • Feedback
  • Design
  • Workflow
  • Non-Latin character sets

Finally, we have the last one, Translation Technology is the technology in which the main action is the interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the source text, and the language it is to be translated into is called the target language; the final product is sometimes called the “target text”. 

SOURCES:





Characteristics of a translation task (FEMTI report) (Q.3) 1rst theme

4 05 2008

According to the FEMTI or Framework for the Evaluation of Machine Translation, the three main features of a translation task are: Assimilation, Dissemination and Communication. In the following lines I’m going to give a brief explanation about each of them.

  • Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.
  • Dissemination: The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.
  • Communication: The ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage. The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.

SOURCES:

* FEMTI – a Framework for the Evaluation of Machine Translation in ISLE, May 2, 16:23 from, http://www.issco.unige.ch:8080/cocoon/femti/st-home.html 





Explanation of three Research Topics (Q.2) 2nd theme

22 04 2008

In this article, I am going to explain three of different research topics that I have chosen between the others.

The first topic I’m going to talk about is “NECA” or “Net Environment for Embodied Emotional Conversational Agents” one of previous projects of the Austrian Research Institute for Artificial Intelligence ÖFAI.

In the NECA project, the focus is on the design of credible agent-agent interaction patterns to be observed by human users. To achieve a high level of credibility, the agents must be able to express themselves using a combination of verbal and non-verbal output driven by personality and emotion models.

Moreover, the NECA project will develop a new generation of mixed multi-user / multi-agent virtual spaces populated by affective conversational agents. The agents will be able to express themselves through synchronised emotional speech and non-verbal behaviour, generated from an abstract representation which can be the output of an affective reasoner. This is the first time that such expressive capabilities are featured in Internet applications. The agents’ usefulness will be evaluated in two concrete application scenarios. From a technical point of view, the emerging NECA platform will provide a confederation of dedicated components including an affective reasoner, co-ordinated generation, and emotional speech synthesis, thus providing a basis for the development of new Internet applications with emotional agents.

The next research topic I am going to explain is “HUMAINE” or “Human-machine Interaction Network on Emotions”.

HUMAINE aims to lay the foundations for European development of systems that can register, model and/or influence human emotional and emotion-related states and processes – ‘emotion-oriented systems’. Such systems may be central to future interfaces, but their conceptual underpinnings are not sufficiently advanced to be sure of their real potential or the best way to develop them.

In addition, one of the reasons is that relevant knowledge is dispersed across many disciplines. HUMAINE brings together leading experts from the key disciplines in a programme designed to achieve intellectual integration. It identifies six thematic areas that cut across traditional groupings and offer a framework for an appropriate division of labour – theory of emotion; signal/sign interfaces; the structure of emotionally coloured interactions; emotion in cognition and action; emotion in communication and persuasion; and usability of emotion-oriented systems. Teams linked to each area will run a workshop in it and carry out joint research to define an exemplar embodying guiding principles for future work in their area.

Finally, the last research topic which I will focus on is Corpus Linguistics:

Corpus Linguistics is the study of linguistic phenomena through large collections of machine-readable texts: corpora. These are used within a number of research areas going from the Descriptive Study of the Syntax of a Language to Prosody or Language Learning, to mention but a few. An over-view of some of the areas where corpora have been used can be found on the Research areas page.

Furthermore, the use of real examples of texts in the study of language is not a new issue in the history of linguistics. However, Corpus Linguistics has developed considerably in the last decades due to the great possibilities offered by the processing of natural language with computers. The availability of computers and machine-readable text has made it possible to get data quickly and easily and also to have this data presented in a format suitable for analysis.

REFERENCES:

* “NECA” or “Net Environment for Embodied Emotional Conversational Agents”. Retrieved, 18:34, 21th April 2008 from, http://www.dfki.de/pas/f2w.cgi?ltc/neca-e

* “NECA” or “Net Environment for Embodied Emotional Conversational Agents”. Retrieved, 19:02, 21th April 2008 from, http://www.ofai.at/~brigitte.krenn/papers/web3d_krenn_paper.pdf

* “HUMAINE” or “Human-machine Interaction Network on Emotions”. Retrieved, 17: 14, 18th April 2008 from, http://www.dfki.de/pas/f2w.cgi?ltp/humaine-e

* Corpus Linguistics. Retrieved, 17:22, 18th April 2008 from, http://www.essex.ac.uk/linguistics/clmt/w3c/corpus_ling/content/introduction3.html





Research Topics (Q.2) 1st theme

16 04 2008

In this article I will point out some research topics that are mentioned on different sites of Human Language Technologies.

Firstly, members of The Stanford NLP Group pursue research in a broad variety of topics:

  • Computational Semantics.
  • Parsing & Tagging.
  • Multilingual NLP.
  • Unsupervised Induction of Linguistic Structure.

Secondly, in Edinburgh Language Technology Group of Scotland, UK we can mention some of their projects which conducts research and development in a number of areas.

  • Combining Shallow Semantics and Domain Knowledge.
  • Text Mining fot Biomedical Content Curation.
  • Cross-retail Multi-agent Retail Comparison.
  • Smart Qualitative Data: Methods and Community Tools for Data Mark-Up.
  • Machine Learning for Named Entity Recognition.
  • Named entity tagging of historical parliamentary proceedings.
  • Integrated Models and Tolls for Fine-Grained Prosody in Discourse.
  • Joint Action Science and Technology.
  • AMI consortium projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location.
  • Study of how pairs collaborate when in planning a route on a map.

Finally, we can mention the German Language Technology Lab, which themes are elaborated in research, development and commercial projects:

  • Exploiting – and automatically extending – ontologies for content processing.
  • Tighter integration of shallow and deep techniques in processing.
  • Enriching deep processing with statistical methods.
  • Combining language checking with structuring tools in document authoring.
  • Document indexing for German and English.
  • Automatically associating recognized information with related information and thus building up collective knowledge.
  • Automatically structuring and visualizing extracted information.
  • Processing information encoded in multiple languages, among them Chinese and Japanese.

REFERENCES:





An European research centre for Human Language Technologies (Q1) 3rd theme

2 04 2008

One of the most important European research centre of Human Language Technologies is Language Technology Lab (DFKI) in Germany. It is very famous because of it gets the main budget of the public.

Their mission is the improvement of language technology through novel computational techniques for processing text, speech and knowledge, a deeper understanding of human language and thought, studying the true needs of the end user and the demands of the market. Moreover, they develop novel and improved applications in three areas: Information and Knowledge Management. Document Production, Natural Communication.

These themes are elaborated in research, development and commercial projects:

  • exploiting – and automatically extending – ontologies for content processing
  • tighter integration of shallow and deep techniques in processing
  • enriching deep processing with statistical methods
  • combining language checking with structuring tools in document authoring
  • document indexing for German and English
  • automatically associating recognized information with related information and thus building up collective knowledge
  • automatically structuring and visualizing extracted information
  • processing information encoded in multiple languages, among them Chinese and Japanese


Eventually, it is important to added the last DFKI LT publications that are two:

* Thierry Declerck, Hans-Ulrich Krieger, Marcus Spies, Horacio Saggion

Human Language and Semantic Web Technologies for Business Intelligence Applications

* Hans-Ulrich Krieger, Bernd Kiefer, Thierry Declerck
A Framework for Temporal Representation and Reasoning in Business Intelligence Applications

References:

* Language Technology Lab (2007). DFKI LT: About. Retrieved 17:23, April 1st 2008, from http://www.dfki.de/lt/index.php

* Language Technology Lab (2007). DFKI LT: Projects. Retrieved 17:40, April 1st 2008, from http://www.dfki.de/lt/projects.php

* Language Technology Lab (2007). DFKI LT: Publications. Retrieved 17:47, April 1st 2008, from http://www.dfki.de/lt/publications.php





Some Definitions for Human Language Technology (Q1) 1st theme

1 04 2008

Nowadays, The Human Language Technology is extending more and more all over the world and we can find some definitions on different pages that explain us what is.

Firstly, we have the definition that Hans Uszkoreit give us in the page of DFKI, Language technology (sometimes also referred to as human language technology) comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics.”

Secondly, according to wikipedia, Language technology is often called Human Language Technology (HLT) or natural language processing (NLP) and consists of computational linguistics (or CL) and speech technology as its core but includes also many application oriented aspects of them. Language technology is closely connected to computer science and general linguistics.

Lastly, to finished we can mention also the brief definition that Meraka Institute gives us:

“Human Language Technology (HLT) makes it easier for people to interact with machines. This can benefit a wide range of people – from illiterate farmers in remote villages who want to obtain relevant medical information over a cellphone, to scientists in state-of-the-art laboratories who want to focus on problem-solving with computers.”

References:

* Language Technology Lab. Hans Uszkoreit (2007). DFKI-LT – What is Language Technology?. Retrieved 12:50, March 1st 2008, from http://www.dfki.de/lt/lt-general.php

* Meraka Institute. African Advanced Institute for Information & Communication Technology (2007). Retrieved 13:20, March 27 2008, from http://www.meraka.org.za/humanLanguage.htm

* Language Technology (2007). In the wikipedia, The Free Encyclopedia. Retrieved 16:50, March 17th 2008, from http://en.wikipedia.org/wiki/Human_language_technology

* Hans Uszkoreit (2007). Retrieved 17:10, March 17th 2008, from http://www.coli.uni-saarland.de/~hansu/

* Natural Language Processing (2007). Retrieved 17:20, March 17th 2008, from http://en.wikipedia.org/wiki/Natural_language_processing





Hans Uszkoreit (Biography) Q.1, 2nd theme

30 03 2008

As Hans Uszkoreit curriculum vitae said, he is Professor of Computational Linguistics at Saarland University. At the same time he serves as Scientific Director at the German Research Center for Artificial Intelligence (DFKI) where he heads the DFKI Language Technology Lab. By cooptation he is also Professor of the Computer Science Department.

Us<koreit studied Linguistics and Computer Science at the Technical University of Berlin and the University of Texas at Austin. During his time in Austin he also worked as a research associate in large machine translation project at the Linguistics Research Center. In 1984 Uszkoreit recieved his Ph.D. in linguistics from the University of Texas. From 1982 until 1986, he worked as a computer scientist at the Artificial Intelligence Center of SRI International in Menlo Park, Ca. During this time he was also affiliated with the Center for the Study of Language and Information at Stanford University.

Finally, his current interests are computer models of natural lagunage understanding and production, advanced applicationsof language and knowledge technologies such as semantic information systems, cognitive foundations of language and knowledge, grammar formalisms and their implementation, syntax and semantics of natural language and the grammar of German.

Here we have his Recent Publications:

*Uszkoreit H.(2007) Methods and Applications for Relation Detection. In: Proceedings of the Third IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing, 2007.

*Uszkoreit H. F. Xu, W. Liu (2007) Challenges and Solutions of Multilingual and Translingual Information Service Systems, To appear in Proceedings of HCI International 2007, 12th International Conference on Human-Computer Interaction, Beijing, 2007.

*Uszkoreit, H., F. Xu, Weiquan Liu, J. Steffen, I. Aslan, J. Liu, C. Müller, B. Holtkamp, M. Wojciechowski (2007)
A Successful Field Test of a Mobile and Multilingual Information Service System COMPASS2008. In Proceedings of HCI International 2007, 12th International Conference on Human-Computer Interaction, Beijing, 2007.
* Xu F., H. Uszkoreit, Xu F., H. Li (2007) A Seed-driven Bottom-up Machine Learning Framework for Extracting Relations of Various Complexity, To appear in: Proceedings of ACL 07, Annual Meeting of the Association of Computational Linguistics, Prague 2007.

*Frank, A., H.-U. Krieger, F. Xu, H. Uszkoreit,B. Crysmann, B. Jörg, U. Schäfer (2007) : Question Answering from Structured Knowledge Resources. In: Journal of Applied Logic, Volume 5, Issue 1, March 2007, Pages 20-48.

References:

* Hans Uszkoreit publications, retrieved March 27 (2008), from http://www.coli.uni-saarland.de/~hansu/hucv_eng.pdf

* Hans Uszkoreit Curriculum Vitae, retrieved March 27 (2008), from http://www.coli.uni-saarland.de/~hansu/bio.html






E-book

9 02 2008
An e-book is a text stored in a digital way which can be copied and read in a PC, or in a recent portable devices for eBooks. These books can be read by programs which are called readers.There are basically two steps to build an eBook: prepare the content for conversation, and choose a tool for conversation to Microsoft Reader. All conversation tools require that you have a clean HTML file and marked a sane with the standards defined by the Open eBook Foundation. For more details on these standards visit the Web Open eBook. You can create an eBook in a short space of time (2 or 3 hours) if they have prepared all the necessary parts. Among the items needed include images JPEGs for Home and the Library, as well as other elements that may need eBook. To get more information on the elements of an eBook see Guides Source Materials and Source Materials Conversion Guide and Conversion Guide available for downloading from the same site and included in Content SDK.

To read eBooks in a PC or poket PC, we have to use a reader program. Nowadays there are two readers available: Microsoft Reader and Glassbook Reader. These programs let us adjust the typography, turn on the pages, use pointers, insert notes, bring texts out , and many other functions necessaries for readding. The books are downloaded from Internet, or can be generated by ourselves. Glassbook read files on PDF; and Microsoft Reader, files on .LIT or Reader which fulfil with OEB rules. These last take up less space than PDF ones. Microsoft give us free machines to turn Word files into Reader, and ReaderWorks company offers machines to turn HTML into Reader. These tools makes very easy turn on Word or HTML files on eBooks.

SOURCES:





Web 2.0

9 02 2008

 Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as platform, and an attempt to understand the rules for success on that new platform.

The concept of “Web 2.0″ began with a conference brainstorming session between O’Reilly and MediaLive International. Dale Dougherty, web pioneer and O’Reilly VP, noted that far from having “crashed”, the web was more important than ever, with exciting new applications and sites popping up with surprising regularity. What’s more, the companies that had survived the collapse seemed to have some things in common. Could it be that the dot-com collapse marked some kind of turning point for the web, such that a call to action such as “Web 2.0″ might make sense? We agreed that it did, and so the Web 2.0 Conference was born.

In the year and a half since, the term “Web 2.0″ has clearly taken hold, with more than 9.5 million citations in Google. But there’s still a huge amount of disagreement about just what Web 2.0 means, with some people decrying it as a meaningless marketing buzzword, and others accepting it as the new conventional wisdom.

This article is an attempt to clarify just what we mean by Web 2.0.

In our initial brainstorming, we formulated our sense of Web 2.0 by example:

Web 1.0   Web 2.0
DoubleClick –> Google AdSense
Ofoto –> Flickr
Akamai –> BitTorrent
mp3.com –> Napster
Britannica Online –> Wikipedia
personal websites –> blogging
evite –> upcoming.org and EVDB
domain name speculation –> search engine optimization
page views –> cost per click
screen scraping –> web services
publishing –> participation
content management systems –> wikis
directories (taxonomy) –> tagging (“folksonomy”)
stickiness –> syndication

The Web As Platform:

Web2MemeMap

Referencias:





HTML

25 01 2008

 Timothy “Tim” John Berners-Lee  y su grupo desarrollaron lo que por sus siglas en inglés de denominan Lenguaje HTML (HyperText Markup Language) o el Lenguaje de Etíquetas de Hipertexto; el protocolo HTTP (HyperText Transfer Protocol); y el sistema de localización de objetos en la web URL (Universal Resource Locator).

 Tal y como dice la web de wikibipedia, HTML es el acrónimo inglés de HyperText Markup Language, que se traduce al español como Lenguaje de Etiquetas de Hipertexto[1] . Es un lenguaje de marcado diseñado para estructurar textos y presentarlos en forma de hipertexto, que es el formato estándar de las páginas web. Gracias a Internet y a los navegadores como Internet Explorer, Opera, Firefox, Netscape o Safari, el HTML se ha convertido en uno de los formatos más populares y fáciles de aprender que existen para la elaboración de documentos para web.

HTML no es un lenguaje de programación, aunque si permite incluirle código en lenguajes de programación, bajo ciertos criterios, extendiendo su capacidad y funcionalidad, aunque eso se logre excediendo los alcances del HTML en si.

Por otra parte, la mayoría de las etiquetas del lenguaje html son semánticas. La interpretación de las etiquetas es realizada por el navegador web. El lenguaje HTML es extensible, se le pueden añadir características, etiquetas y funciones adicionales para el diseño de páginas web, generando un producto vistoso, rápido y sencillo.

Etiquetas: Un lenguaje de etiquetas es un conjunto de palabras o caracteres que se colocan junto al texto de un documento para especificar una propiedad del mismo. 

Tipos de etiquetas: físicas y semánticas. Las etiquetas físicas son las que tienen una función de estilo determinado mientras que las semánticas son etiquetas equivalentes a las físicas pero que se definen mediante un lenguaje coloquial, como la etiqueta strong del ejemplo. Con ambas se logran los mismos resultados.

REFERENCIAS:








Seguir

Get every new post delivered to your Inbox.