COMMERCIAL PROFILE - SCIENCE FOUNDATION IRELAND:The World Cup brings football fans from around the globe together and a new technology from an Irish-based research centre will help them communicate
AFICIONADOS OF the long-running movie and TV franchise Star Trekwill no doubt be familiar with the universal translator – the piece of fictitious technology which allows the crew of the starship communicate with one and other and a variety of exotic alien species without the need to bother learning any language but their own. While more a dramatic device than a genuine piece of futurology on the part of the scriptwriters, such technology is now coming close to reality thanks to work being carried out by the Science Foundation Ireland-funded Centre for Next Generation Localisation (CNGL).
CNGL is based in Dublin City University with academic partners UCD, UL and TCD, and with industry partners IBM, Microsoft, Symantec, Dai Nippon Printing (Japan), and SDL as well as key Irish SMEs, Alchemy, VistaTEC, SpeechStorm and Traslán. It boasts a unique concentration of research and development expertise in language technologies, machine translation, speech processing, digital content management and localisation.
Ireland already has a substantial global footprint in the localisation industry – the process of adapting digital content to different languages and cultures. It dates back to the late 1980s when Ireland’s print industry was successful in winning major contracts from technology leaders such as IBM and Microsoft for the printing and packaging of their extensive manuals. A natural evolution of this business was to engage in the translation and localisation for distinct markets of this printed material.
CNGL pools that well of expertise and is developing the next generation of language and content management technologies to support and develop the localisation industry.
The centre was established in 2007 as a Science Foundation Ireland (SFI) CSET – Centre for Science, Engineering and Technology with funding for five years but its origins go back some years before that according to CNGL director Prof Josef van Genabith. “We are sort of a baby CSET at the moment,” he says. “We are just half way through our initial funding period but it took us about four or five years to put the application together in the first place.”
The idea had its genesis in IBM’s Irish operation. “IBM had a lot of people who used to work for Lotus in the localisation space,” says Prof van Genabith. “They had the idea of setting up the centre as a CSET to research areas like machine translation.”
It then moved onto DCU and eventually grew to encompass four universities and nine industry partners. “It is a large partnership but we are lucky to be based in Ireland where we all knew each other. It would have been a lot more difficult to do this in Germany or another large country where the companies involved would not be so open to collaboration. We also got a lot of support from the IDA at the time. They put is in touch with Dai Nippon Printing for example.”
The centre is currently tackling three critical problems for the localisation industry: those of volume, access and personalisation.
The amount of content that needs to be localised into ever more languages is growing steadily and massively outstrips current translation and localisation capacities. As a consequence, only a fraction of the content that needs to be localised is localised and usually only into a limited set of languages. “Ten years ago companies only translated documents into a limited number of major languages, now everyone quite correctly expects material to be translated into their own language and this is creating increased demand for translation and localisation services.”
Access is a more interesting challenge. Traditionally, localisation assumes print or full screen and keyboard-based access to content. More recently however, new and evolving generations of small devices such as smart phones and iPads support on-the-move and instant access to digital content. Novel means of interaction such as speech-enabled access are not supported by current localisation technologies and CNGL is addressing this.
And then comes personalisation. Traditional localisation is what is termed coarse-grained. In other words, it doesn’t go into fine detail. It will localise for a country or a region but not necessarily for groups of individuals like teenagers, women, or workers in certain sectors. CNGL is looking at ways of overlaying traditional localisation with more fine-grained personal information cutting across standard notions of locale and linguistic environment. “The individual is the ultimate locale,” van Genabith points out.
But it is the coming together of all three areas that creates the most exciting possibilities for the centre. An example of this would be a person speaking to their smartphone or iPad to ask for certain information from the web; the device browsing to a site with content individualised to their needs and preferences; and that content being updated and translated into their language on the fly as it appears.
“We try to conceptualise this in terms of a cube with the three areas of volume, access and personalisation forming the X, Y and Z axes,” van Genabith explains. “The challenge is to develop next-generation localisation technologies and processes that allow us to address any point in the space defined by the localisation cube, realising the CNGL vision to enable people to interact with content, products and services in their own language, according to their own culture, and according to their own personal needs.”
The system framework for this cube is all important. “This is the backbone on which we support the various demonstration activities of the centre,” explains Dr David Lewis. “We have more than 100 researchers in different locations working on different areas and we are trying to support them and integrate their work very quickly. The market is moving very quickly. We have to be able to quickly bring teams of people together from different areas to work on specific projects.
One current and topical example of such a project in action is Twanslator – a service for football fans who want to follow tweets (Twitter messages) on the World Cup in their own language. CNGL researchers have developed a system to allow football fans follow World Cup tweets on Twitter in their own language.
Natural boundaries exist between Twitter followers, usually because they are only connected to people who communicate in a language they understand. Twanslator WC 2010 is an attempt to filter the information streams on Twitter during the World Cup into a number of different languages and create match summaries. CNGL is collaborating with the SFI Clarity centre on aspects on this work.
A system has been developed that instantly translates and streams tweets in six European languages – Dutch, Portuguese, Spanish, Italian, German and French – into English. Tweets in English are also translated back into each of the six European languages. The service is available through myisle.org/twanslate, which will offer a graphical analysis of tweets about each match.
“There are more than 100 million Twitter users globally and it has become a channel for people to have conversations about things happening in the real world,” says Dr Declan Dagger, who is engaged in digital content management research at CNGL. “For example, the US government recently used Twitter to help co-ordinate its relief efforts in Haiti following the earthquake there. The World Cup was an opportunity for us to push the envelope in terms of the technologies we are developing and Fifa has picked up on this.”
For Ireland the importance of having such research carried out here is of critical importance for the future. While much of the actual translation work of the Irish-based localisation industry is carried out overseas the higher value technological and project management elements are retained in Ireland.