MuSE: Next Generation Speech Translation

The MuSE speech technology platform is the outcome of a number of research projects involving Prof. Julie Berndsen and the UCD CSI MUSTER research group over the last 5 years. The speech recognition components of the system were developed as part of the MUSTER project (an SFI Principal Investigator Award) and the speech synthesis components were developed in the context of an IBM funded IRCSET award. These components are now being further developed in the context of the Centre for Next Generation Localisation CSET CNGL). Speech Technology is a major component of the Integrated Language Technologies track within the CSET which has the aim of utilising the synergies between Machine Translation and Speech Technology to address access to multilingual digital content in a flexible way in a hands-busy eyes-busy scenario.

The  Muster Speech Engine - MuSE -  explicitly integrates linguistic information of varying granularity into the recognition and synthesis processes.  This is achieved on the recognition side by applying machine learning techniques to extract phonetic features from the speech signal and by employing a constraint-based approach to phonological parsing which allows syllable and word structures to be identified in the feature data. On the synthesis side, phonetic features are used to enhance the current state-of-the-art synthesis technique where the synthesiser models both how sound is heard and how it is created.

Existing approaches use statistically dependent phone models which are sensitive to co-articulation, noise, speaker dialect  and language. By using  multiple levels of information of different granularity (e.g. features, phonemes, syllables, words), such limitations  may be overcome. The Time Map Model used in the Speech Technology project combines a temporal interpretation of multilinear representations of features in a speech utterance together with a finite state model of syllable structure to deliver a novel constraint-based approach which utilises multiple sources of information. 

MuSE was originally designed as an open-source platform consisting of tools which can be deployed in various contexts. There is significant interest in specific components of the platform (e.g. finite state toolkit among computational linguistics and for computational phonology and corpus linguistics work). The MUSTER project had a specific industry collaboration with IBM Dublin where it was possible to utlise some of the machine learning techniques in particular for other types of information detection.  The CNGL has significant industry and international academic collaboration - SpeechStorm (an Irish-based SME) is the industry partners on the speech side. In addition to the Irish academic partners within the project, we have a collaboration with the University of Stuttgart and the University of Bonn in the area of speech synthesis in particular.