Z Digital Agency Manager interviewed Claudiu Musat, Research Director in Machine Learning & AI at Swisscom.
Claudiu is an engineer, with a PhD in text mining and a post-doc at the EPFL on recommender systems (a subclass of information filtering system that seeks to predict the “rating” or “preference” that a user would give to an item).
After creating a text analysis startup in Switzerland and a subsequent stop at a startup in opinion mining in the US, he became the Research director in Machine Learning and Artificial Intelligence at Swisscom, handling product ideation, development and solution pre-implementation with Swisscom Enterprise clients.
As part of Swisscom Enterprise he mainly works with use cases from client enterprises to create customized solutions. Some AI is also created and tested in-house with Swisscom own data sets.
If we take a step back, we can see three levels of AI:
– a narrow AI performing one task only, in an autonomous way
– an general intelligence architecture
– an artificial super-intelligence (skynet style)
We are now between types 1 and 2 – we are mixing various models to create useful systems that are not yet general enough to handle any task given to them (i.e. not human yet).
Swisscom is currently working on creating useful artificial intelligence systems. These are usually based on user queries – when they make a call for instance, reasoning with data and giving back an answer to that query. The aim is to close the loop by providing a useful feedback.
Using recurrent neural nets, Claudiu’s research team works on text and speech recognition.
In AI there is a “No free lunch” theorem: all methods are of equal value in general. For specific tasks though, the differences make some methods significantly better than others. In the case of speech recognition we use two systems:
- An end-to-end system with a speech transcription processed through a model
- A combination of an acoustic and a language model. The acoustic model converts speech into phonemes that are subsequently mapped into language by a language model.
In the case of text recognition there is a plethora of competing methods, including for instance:
- A classification task with a feedforward neural network
- In case of a sequence problem (for example for more complex conversations), usually tackled with recurring neural networks (RNNs)
On the one hand we have classic needs involving more data sets, more processing power and better algorithms. On the other hand the amount of new use cases in the industry has dramatically increased over the past 3 years.
In Artificial Intelligence there are 4 phases we are trying to reach:
- 1st phase: a model with a rule or a sequence of rules
- 2nd phase: the operator is manually adding features, which are then processed
- 3rd phase: representation learning without manual input. The machine learns the features that will then be optimized.
- 4th phase: the machine learns how to learn, which will increase its power by order of magnitude
Chatbots. Claudiu’s team is currently working on all 4 types of bots:
– FAQ bots: with pattern-based answers (classic questions to be recognized to provide classic answers from a Database)
– Information Retrieval Bots based on changing environments: it understands what the user wants and retrieves the answer.
– Personalized Bots: it maps the user question to his own needs, using personal data. This neuronal programming maps the query to a sequence of steps to retrieve the correct answer
– Goal oriented bots: they also perform actions, using intent recognition (for instance to automatically change your password with the one that you want and communicate it back).
Maintaing balance. It is challenging to divide time between researching, prototyping and handling clients, while trying to keep afloat with the industry latest papers.