The title screen appears. A blue background is shown with the University of Derby three hills logo in white in the middle of the screen. The University of Derby's name in white is directly below. White text on the screen reads below:
Professorial Inaugural Lecture Series:
Natural Language Processing: A Fulfilled Promise? By Professor Farid Meziane
The title screen fades out and the recording of the lecture starts.
[Paul] Good evening, ladies and gentlemen and welcome to the latest in the university's inaugural lecture series. Tonight, we have Professor Farid Meziane talking to us about "Natural Language Processing: A Fulfilled Promise?" Well, I’d like to now introduce you to Professor Sunil Vadera from the University of Salford, who is a long-standing colleague of Farid and I will say no more.
[Sunil] Thank you to the chair of the Professorial council, ladies and gentlemen, distinguished guests, friends, and family.
Welcome to this inaugural session from Professor Farid Meziane. As the chair of the Professorial council said, I’ve known Farid for over 30 years, and I’ve had the pleasure of witnessing many of his contributions to the field and to the development of others. In fact, he was my first PhD student
over 30 years ago. You can see from my age that I’m still young, mainly because of Farid.
I first met Farid in 1988 when he wanted to pursue a PhD because he found the MSC very easy and I suggested two ideas. One that involved the refinement of a machine learning algorithm and the second which I thought was near impossible which was to aim to produce formal specifications from the English language. English language, with all its ambiguities and the challenges it has. So, that still remains a challenge and you can guess that Farid chose the more difficult option despite my efforts to convince him otherwise. This typifies Farid's career; he's one that takes on challenges, and he then succeeds in those challenges. In the case of producing formal specifications from the natural language, he was able to develop the principal way of doing that translation. Translating natural languages is quite a task, even today. Not only was he able to do that, but people would argue that it's just theoretical work but in Farid's case he took it further, he applied it to a real case study and was able to get results from a real case study.
This still remains a challenge so that's showing the kind of quality of work that Farid produces, not only then but throughout his career. In fact, the examiners were so impressed that they insisted on zero corrections to his thesis. It's still one of the few PhDs that I’ve supervised that have got zero corrections in the thesis. This also highlights his ability to supervise others because his own standards are so high you can always use his own experiences as a PhD student. In fact, it just reminded me that at the time I was supervising him I was also doing a PhD and he was trying to race ahead of me to try and get his PhD before his supervisor got his PhD.
After his PhD, Farid was recruited by the University of Malaysia, Sarawak as a lecturer in software engineering, where he established a new program in software engineering, encouraged research and became the director of post-graduate studies. Farid returned to Salford in 1998 after much persuasion from me to apply for that role where his commitment to quality research and development of new programs like an undergraduate master's program in software engineering
led to growth in student members at Salford which increased collaborations and he also increased our PhD student numbers and research grant income. He was promoted to a senior lectureship in 2004, a readership in 2007 and a chair in 2012. He served with distinction in many leadership roles including as Associate Dean International, Head of the Data Mining and Patent Recognition Research Centre and was appointed the Lead for the Salford REF submission (I understand the results have just come out, so we'll look at those and then get in contact with Farid to let him know how well he did).
His move to the University of Derby opens a new opportunity where I’m sure his ability to build teams and bring colleagues together will lead to great things for you. Internationally, Farid is best known for transforming the natural language and databases series of conferences. This series I recall was struggling in 2004. It hardly had any papers prior to 2004 and it was close to shutting down until, of course, Farid came on the scene. He brought it to Salford, established an active program committee and used his international network of contacts to dramatically increase the student numbers that increase the paper numbers and increased the quality of the conference. Of course, he brought it to Salford, I’m not suggesting it was Salford that gave it the extra papers, it was very much varied. The program committee was so impressed with what he'd achieved that they constantly invited him to organize further conferences and he organized a further three conferences at Salford and I’m pretty sure there will be a conference at Derby that will follow as a result of Farid being one of the leads in this conference.
There's only one thing that Farid is more committed to than his work and his professional work and that's his family and I’m pleased to see Mesiba and the rest of the family here; Sibo, Yasin, Amir and Adam are his first love and he's always enthusiastic about them and about the work they do. So, it's really great that they're here today. I know that, like me, you're looking forward to his presentation so without further delay, please welcome Farid to present this inaugural lecture.
[Farid] Thank you Sunil for this presentation. Ladies and gentlemen, colleagues, friends and family thank you very much for joining me tonight for my Inaugural Lecture at the University of Derby. I have done quite a few topics of research but I thought that the one topic that I have done since my early days in research was that of natural language and therefore, I’m going to talk about this.
Computers are some amazing machines as we all know and we use them today, however, they have not been always friendly. In the early years, you couldn't take them home to finish your work or do your assignments, so this is what they looked like in those days. Communicating with computers in those years was also a challenge. If you want to give them a simple instruction such as z = x + y then this is what you need to write. Those successions of ones and zeros. This is the only language that computers could understand in those days, and this is known as the machine language. Some improvements were made over the years and then we managed to develop another set of languages called assembly languages. So, the program is only what you see here and the rest is just a description of what that program is doing. This is far from our language but at least there are some terms there that we can understand. We've got "proc" that stands for the procedure we have “amp” that stands for comparison and towards the end of the program, we have a "return" and "end program".
However, a few years later we managed to develop a different set of languages, and these are known as high-level languages. finally, we can define variables the way we want and programs in those languages looked more like English, so something that we are more comfortable with and that we can understand, and it helped software engineers a lot in maintaining the software because they can understand what previous programmers have done. Of course, this is not natural language processing, this is just giving you an idea about the challenges that computer scientists have faced from the beginning and where they tried to communicate with computers in a language that they are more comfortable with, that they are more familiar with.
Before giving you a brief history of natural language processing, it's worth mentioning what is known these days as the Alan Turing test or "The Imitation Game". So, Alan Turing was probably one of the brightest minds that this country has produced and stated in 1950 that a human being is going to evaluate a conversation in natural language, and then, basically, you're going to evaluate two. One is going to be produced by a machine and the other one is going to be produced by a computer and if you cannot distinguish between the two then you can say that your application, your software, or your system has passed the Turing test. There was quite some work that was done prior to 1950 during the war but they were only ideas and there was no documented research in terms of NLP in those days.
The first phase of NLP was between the late 40s and also the late 60s. So, they continued with the idea of using computers for translating from one language to another, which is known as machine translation. It was very basic, so basically, what they used to do is have two different dictionaries and then the translation was a word for word, so everything was based on words and was based on syntax. We need to understand also the environment in which those systems were developed. So, it was the first time that computers tried to process data that was not numeric, so it was text and a challenge for computers. And then, we don't have any computing power in those days either. Any mobile phone that you've got in your pocket today is probably 10 times or more powerful than the best computers in that period, we don't have storage like what we have today and when you are dealing with languages you need large storage for vocabulary, grammar and so on. The most challenging part is that programming in those days was done in the assembly language, so another difficulty.
So, there were about 20 years of research and then suddenly around 1966, the ALPAC report came out. So, this is the automatic language processing advisory committee, and this is the body that sponsored research in machine translations in the states, so they're providing a lot of money and so by 1966 they have taken the decision that computers cannot translate from one language to another and therefore they have stopped our funding. Thank God there was a little bit of funding that was left, it was for computational linguistics. Then came phase two, so this is from the late 60s to the late 70s and this is also a period where artificial intelligence has started developing as a subfield of computer science. So, it's no surprise that most of the works that were done at the time were based on AI techniques that were available in those days, so most works were about developing question systems. For those who are in the field, you probably know that Eliza was the first system to be developed but there were also other systems such as Baseball and Lunar. Baseball was trying to answer questions about all events related to a baseball season within the United States and Luna was trying to answer questions about the geological properties of the rocks that were brought back from the moon through different Apollo missions. So, it's basic if we look at them today. They use what we call pattern recognition, so they're looking at structures within the questions and trying to find if there is a similar structure within the answers.
So, facts were taken from experts and put in databases and then if there is a match then you provide an answer. In 1971, at a conference, Lunar managed to answer 90 questions from an audience that we are not aware of the work or how the internals of the lunar system work. For the first time, we have also started talking about what was known in those days as "world knowledge", so this is the context in which those systems were operating. So today we talk about semantics, so this is how it all started and then we have on the other side language that started developing different types of grammar. These are known as transformational grammar or transformational generative grammar, so they're grammars where you can say that the sentence is composed of two parts. For example, a known phrase and a verb phrase. A verb phrase is composed of a verb and a known phrase etc Chomsky was very known for this and he published his first book in 1965. Then we have the semantic frames, so for the first time, we're trying to give or create some kind of structure with some kind of cohesion of some concepts within a particular field. Then, we went into the late 70s and late 80s. This period is known as the grammatical lexical phase, so for the first time people have started using logic to represent knowledge inside those databases and therefore we were able to have some inferences and provide more precise answers to the questions that were asked in those days. The emphasis during that period (and it was what we're doing at that time) was on declarative communication as the fundamental process, and then we had also around that time the first idea of exploring more applications for natural language processing and in my particular case, it was about conceptual modelling. So, I’m not that old. So, I would only be introduced to natural language processing towards the end of the 1980s. It was precisely 1989 like Sunil has just mentioned and he also mentioned the kind of work that I have undertaken with him. The only problem at the time was trying to explain to colleagues and computer scientists what you are trying to do, but what was more difficult was to explain this to friends and colleagues that were doing engineering on the first floor of the Newton building where most of their work was computational fluid dynamics and it was basically just taking an equation, putting them into a machine, running and creating some data, some graphs and that's the end of it. Try to tell them that software is more than this! So, the expectations were very high for what computer science PhD students can do.
So, I couldn't explain it to them straight away, I’m learning, I don't know about formal methods, I don't know about natural language processing, and I have never programmed in Prolog. And then, one morning a miracle happened. I run my programme, and something happened, so, we called friends and told them to look, there is something that I want to show you today. So, I set it on F34 and then called the prologue interpreter and this is what you got in those days but then you've got that question mark meaning "Give us a query. What do you want?" So, this is what I provided, a sentence, "A man loves a woman". That sentence was used a lot in those days to explain how logical form language works, so "a man loves a woman" and then it said ok so then I pressed enter and this is what you get. They looked at me and said, "is this five months of work?" and I said yeah this is what it is, and more things are going to happen but for the time being this is all I’m getting. So, the joke for the next five months or six months was whether the man was still loving that woman or not.
I tried to explain things to them, but it was too late. So, very quickly, this is what happens. Inside, you have a sentence that's part of a case study or stock management system, then, we use grammar to create a syntax tree for that sentence and then from there, of course, we produce logical form language which then from that logic form language it was easy to move to predicate logic which looks more mathematical than the logical form language. Then, from there we were able to produce some specifications for operations. Some of them were very simple such as just adding an item into the stock, but some were more complex such as reordering items from the stock. So, you need to understand the relationship between the attributes of that data type and then decide when you want to reorder. This means that the quantity in stock is less or equal to the minimum reorder level, so there were a lot of processes that had taken place there. Sunil has mentioned case studies, so we wanted something real, something that was developed independently from us, and we got a case study from British aerospace about a problem planning a flight route from point A to point B and the route is planned as a set (or more precisely as a sequence) of Y points and the Y point is identified by an identifier which is a number and a grid reference. Again, after a few months of work, we did manage to produce some very good specifications for most of the operations that were needed. That was my first paper that was published. I don't remember having any issues in getting it published but what can remember is that in those days we didn't have emails, so it was only names and the department address that we've got on our papers.
I then got my PhD and moved to the University of Sarawak. It was a very good experience for us, particularly as a family and my second son (who is sitting here) was born there. We learned a lot of things as a family, for example, never try to have your breakfast outside as you will always have unwanted visitors and that particular morning, we had all our cereals taken away and the milk was also taken away. We have also learned that should never swim with jellyfish, it's quite dangerous.
Anyway, I continued to work with Sunil, and we came out with this second paper which was published in the annals of software engineering, and it was quite a strong paper. We reviewed all the systems that tried to produce formal specifications in those days, and we suggested a new architecture for those systems. We have also proposed some directions for future research. The most important aspect here that is new is this concept of internal representations that you've got here and viewing modes, so we want to have different modes when we develop software. We know that natural language can perform specifications or other models such as object models.
I moved back to Salford in 1998 and there are some dates there that are quite important for me. So, in 1995 there was this first NLDB conference that Sunil mentioned, so it was a group of researchers from Versailles in France and also another group from Amsterdam in Holland and they have started to look into whether it was possible to query databases using natural language. They came together and they created the first workshop in 1995 and in 2001, I managed to get to the conference and I attended the first NLDB, so I felt really at home. Everything that those speakers were talking about was something that either we did, or we tried to do and then after the first break the two co-chairs of the conference came to me and said "Well, you seem to know a lot of things about what
we are doing and when are you going to present your paper and what are you going to talk about?" I said no, I’m not going to present the paper actually, I found out about the conference late and I’m just attending, so they said "Oh, what research were you doing?" And I said trying to produce formal specifications from VDM and then said "Oh! Are you the VDM person?" So, I said yeah and they said “For many years we've been writing to you at your address and trying to get you to the conference and inviting you, but you never replied” so I said I did not reply because I was away and I was in Malaysia.
Sunil has mentioned that three years later I organized the conference in Salford and if everything goes to plan next year, we are going to organize it here at Derby. There was quite a lot of interest from many of the researchers in automating software engineering or using natural language for system modelling and then going back to that vision that we had tried to implement which was my first research task really at Salford, so we had three PhD students working on that. Two PhD students did not do a good job, so we started trying to get them to work on that internal representation which in those days was XML. So, the first PhD students were the ones who tried to move from natural language to object models through the internal representation and left after a year and a half due to some kind of financial issues and then another one was working on object models in the form of specifications, again, through the entire presentation but then we had an issue, a private company I found in Denmark had taken the problem, solved it, produced a book, produced software and so on. So, we struggled to get some kind of novelty in that work and to get the PhD students to graduate. The best work was produced by NFL students, so it was generating natural language specifications from UML class diagrams, and it is, up to now, the most cited of my papers. This is the architecture that we have proposed for that system.
There is nothing new in the mid-layer here, this is how systems that generate natural language were designed in those days but the most important part was this database that we've got here wordnet. So, this then was the transition between what I would call my first research and also moving to the next phase which in this case was the beginning of phase four, so the late 80s and 2000s.
So, this is the era of large lexicons, we start producing very large vocabularies and it was a web, it was the availability of large quantities of machine-readable texts or data and then we have more power in hardware and the machines and then some other disciplines emerged and developed such as knowledge engineering, information extraction and information retrieval. So, they were probably given the tools to expand and to become better and then we have the ontologies of course. At the same time, we have the advent of statistical natural language processing, and the use of statistics for natural language processing. Ontologies are just putting some kind of vocabularies together; you create relationships between them and then you have a model of data for a particular field, medicine for example or conceptual modelling or you can do it for a full language, and this is what wordnet has tried to do.
When you have ontologies, you allow knowledge sharing between scientists you encourage knowledge reuse between different systems and different components. You can communicate in a clear way between systems and users, and it was supported by some languages such as the web ontology language or "OWL" and also some tools such as protege. So, what we've got here really is just a part of a medical ontology where you have concepts and then how they are related. Here you've got part of word net, so this is only for the verb so make, con and also... I don't know that one, okay. So, it shows all the relationships between those verbs, those words and so on and how they are linked together. So, that was the transition from what I used to do and the new era and then I was quite lucky that we have moved quickly into this area because it was also a period where, with colleagues from the department of the built environment, we have managed to secure about three EU-funded projects and those projects... we were mainly working on documents that are generated (or they were generated) in the construction domain. So, they're quite complex they're quite messy. Different terminologies were used, and users were not able to extract the right documents to manage the different versions of the documents and so on. So, we came up with the idea of producing some new methods that are going to use the available technologies in those days, and this is what we have produced.
So, the system really was based on some solid theoretical foundations, but it was also deployed in a real... I would say a kind of business setting, meaning that companies have used it. So, what we did there was mainly have these ontologies or create an ontology for the construction domain having some concepts that are defined there that are used and then have some kind of automated indexing of those documents, meaning that you are going to have (across all different users) the same words for indexing those documents. This will make not only searching and categorizing those documents more efficient it also allows you to follow any kind of updates on those documents and reindex them if needed, so the work was quite interesting. It was published in the information sciences journal and then we moved into another application for those from the computer science department.
You probably have heard about this last year, so I presented the work in the data science research centre seminars. Radiologists have a big issue when it comes to reporting the output of an examination and it's written in natural language and language that is incomplete, sometimes they are using abbreviations that no one can understand and we're trying to design something that will be standard and the aim is that if we've got something that is standard we can then use machine learning techniques or natural language techniques to analyse the data and use some kind of artificial intelligence in those reports. So, we have used (in that work) one of the strongest linguistic theories that existed in discourse analysis which is the rhetorical structure theory together with the ontologies and we have produced a system that did work very well, and so, the system not only allows the use of the standard document that we have produced with all the data laid out the way we wanted for further analysis, but we also have a system that allowed the transformation of reports that are written in natural language, so new ones and old ones into the standard form. We realized that some radiologists don't want to use the new one, they want to keep writing in free text, so we allowed them to do so but our program was transforming that into the standard form. So, for me, it's more like the end of an era, so here I do remember my kids coming one day and saying "well this is what happens in chemistry, we were taught some kind of chemistry one year and the second year you go there and say forget about what you have learned, that was fake chemistry, now we are going to start learning about real chemistry."
So, what happened in the last few years is similar to that, it's as if someone is telling you to forget everything that you have learned. This is not how we do things; things are going to be different. So, remember that back in 1999 we had the beginning of statistical NLP and then there was the usage of machine learning in many other fields and with success as well, so then we also have deep learning and more data available and we have more computing power, so, it was the era where deep learning and machine learning came in. Now I’m going to go quickly and try to explain some of the blocks that are needed for this new era of NLP processing.
First of all, you need a language model, so a language model is just a probabilistic model. In its simplest form, you have a word or two or three words that you have already encountered or read and then you try to predict what is the next word, so this is what you do. If you write an email or text sometimes you see something that is coming up and if you want to accept it, you just press enter and you get it. So, if we assume that our corpus is only these five sentences that you've got here then what we've got is what is the probability of having YOU knowing that YOU have just encountered the word "thank"? So, based on this corpus that you've got here there is only one sentence that has that, and this is the first one. We have "thank" then followed by "you", so the probability is going to be the number of times that "thank you" occurs divided by the number of times that the thank occurred. And therefore, in this particular context within this corpus the probability is one and therefore, we can guarantee that whenever you have "thank", this is going to be followed by "you".
Of course, we have more complex representations. You can consider two words, three words or four words at a time. And then for "Diego" and "San" here, we've got two instances where "San Diego" was mentioned so the third one and the fourth one but we have three sentences where "San" is also used. This is the fifth one with "San Francisco" so the probability of having "Diego" after the "San" is going to be two-thirds or 0.67, okay? And then, we have neural networks so I’m not going to give you a lecture on the networks, but neural networks are some very powerful models, so they're trying to mimic the way our brain works. So, we have inputs (part of that input layer) and we have so many layers inside that we call hidden layers, they do a lot of processing, and a lot of calculations and then they produce some results. So, the way they work is that they go forward, they have some kind of wait period and then they get the outputs, if the outputs are not good then they go backwards and update all those waits until they get an output that is acceptably good or precise. And then, we also need a language representation, so just like we humans need a proper representation of a language, machines also need a language representation. So, we need an alphabet, we need to understand the way words are formed, the grammar and so on and this is what machines also need, so we need to provide some kind of efficient language model.
For many years, we've been using vectors to represent words or documents, so when you have a document and you want to index your document you are selecting a set of keywords and those keywords are put as a vector and then when you compare two documents, for example, you try to find out the similarity between the words that are in those two vectors and say whether the two documents are similar or not. So, the easiest way to do that in the old days was to use what we call the cosine similarity measure, so when you plot those vectors representing item one and item two it's good to have an angle in between and then that angle is called data and the smallest data is the closest in meaning. And then, the cosine is larger for small angles and smaller for larger ones. So, if you have many words, you are going to compare each pair of words to find out which document is more likely to be similar to the one that you have seen, or, if you are working at the word level and finding the meaning of the two words that are close in meanings then there is one other representation and this is known as the "one-hot" representation. So, this is very simple. What you do here is if we have the sentence "Welcome to my inaugural lecture" then "welcome" is going to be the first word and therefore you're going to put one in the first position followed by zeros, two is the second word so you're going to have one and second position and you go all the way until you get the last word which is "lecture" here and you put it to one at the end. So, what we've got here is just a very simple representation.
So, if you've got, let's say... 1000 words, then your vectors are going to be of size 1000. In the early days, this was not really very popular. We all know that when you work with matrices that if you've got plenty of zeros and only a few ones that it's not an efficient way to store data, particularly, trying to work with matrices when it comes to processing them. And then, what was the game-changer in the field is the introduction of what is known as word embeddings. So basically, someone came and said "what if we are not going to do all those relationships? We are not going to do the cause similarities, or we are not going to find when two words are similar meanings. What if we just throw a large vocabulary to another network or a deep learning model and let it work and find out the similarities itself?" So, that was really a big change. So, this is really a class of techniques rather than one individual technique and the most popular one and probably strongest one is "word to VEC" and it was designed and patented by Google, and it was trained on over 3 million words and has got 300 dimensions, and this is how it works.
So, you throw three million words to another network, and remember, it's going to look at all the documents that are available to it, so every piece of news that Google had was looked at. So, it's like you or another person reading thousands and thousands of books, thousands and thousands of pieces of news and then you remember everything, saying that "every time this word is mentioned. Somehow, somewhere the other word is mentioned so this means that this must be some kind of relationship between them." So, if you imagine the amount of data that we have available these days and the kind of training that has taken place then we have managed to get into this stage. So, if we have, let's say, four words. There and four words here and this vector is telling us for example that a "cat" is 0.6 and is similar to something that we can identify as a living being. It's got 0.9 as being feline, 0.1 being a human and so on. We did the same thing with "kitten", "dogs" and "houses", so this model that is produced is fixed and we can use it. So, suddenly we have millions of words, not only do we know what those words are, but we also know how they are related to the rest of the embeddings and to the words in the categories that we have defined. We cannot plot this in 7D dimension as we've got here so if you reduce them to 2D then this is what you're going to get. You notice that "cats" and "kittens" are very close here and the house is out there, and the dog is here. You can draw your vectors here and look at the data but who cares about data anymore?
These days, you've got the 7D representation and we have that information.
If you are looking for relationships between concepts then this is what you're going to get, you're going to find out that a "man" and a "woman" are somehow related and then a "king" and a "queen" are also related. Remember, these are vectors, so if you have vectors then you can define operations on the vectors and if you can define operators then you can end up with things like this... You can say that "king" - "men" + "women" = queen, so of course, what we've got there between brackets is a vector representing that word, so it may not make sense the first time you read this but what it says basically is if you have a "king" and you take the "man" out of the "king" and you replace it with a "woman" then you are going to get a "queen". Same thing if you take away "France" from "Paris" and you replace it with "Germany", then you're going to have the capital of Germany instead of the capital of France, so it's doing some weird things.
If you think this is weird, then it is probably but it looks at our brains as well and how they operate. I’m sure, that many of you must have received something like this, a message that is written through a combination of letters, numbers, special characters and so on. You might struggle to read the first line but once you do the first line you can read everything past it. Those chain messages will tell you probably that you are "among the 10 most clever people in the world!" and so on and it's not true, anyone can read this. So basically, our brains don't need to have the full picture to understand a text, you probably just need to look at this... we've got only an M so we can guess that this means message and that if you go down there it says, "amazing things" and "impressive things". So, we can read this, and our brain can take fractions of some information and make sense of this text. It's because we've been reading for years, we've been looking or listening to the news for years and we have built some pictures inside our minds that allow us to have this kind of understanding.
So, on the surface really that's very easy to understand or at least comprehend. These are the first layers of your deep learning models, so you can have many layers there. So, this is how it works, if you have one vector like the one here it represents a particular word and what you've got here is the embedding wait matrix, so this is normally the layer that will come straight away after the input layer and if you put all those vectors together you are going to have a matrix. And then, if you perform a multiplication between these two you are going to end up having this, and this is only the third line that you have here, so all the others are going to be zeros because of the zeros that you've got here. So, suddenly within the first two layers that you've got, you are going to have an output that is a vector with relationships to other words and meanings, so you are concentrating on the word that you want to process and at the same time you have all its relationships and characteristics within that particular domain. So, that's really something important when it comes to understanding the meaning of words and therefore understanding texts. However, that's not as simple as it looks because we have more complex things inside.
Once you move away from this first layer this is what you get. So, if we take an example where recurrent neural networks are used... so this is a special type of neural network, this is what you get in the first instance where every word is going to have a cell in your recurrent neural network and one of the inputs to your current network is going to be the embedding vector of the word, meaning that the word and its relationship with the others. Not only this, but those cells are going to communicate with each other, so every time that you have some kind of understanding you provide it as an input to the next cell and therefore you add more things. So, this is what I have learned, and this is the new word.
So, I’m going to move to the next step which is either generating the text answering questions or something else, but we are collecting all the knowledge that we had. Then, of course, we have this special function, in this particular case the H function that controls everything and makes sure that all the values that we've got will remain between -1 and 1, making sure that learning is taking place in an efficient way and we're not getting some extraordinarily large numbers. So, it works, and it produced some good results, however, there are some issues. Because it's a very long sequence of words you may end up having thousands of those cells and there are two problems with this, one of the problems is trying to remember everything, so those systems are known to remember what has just been processed, so the cells that are just next to the one that you are processing but if you have a cell that is 300 or 400 words before then that kind of knowledge and information we tend to forget, or at least the system tends to forget it. There is also another theoretical problem that is related to this which is that of the vanishing gradient problem.
So, I’m not going to mention this, but these are the mathematics behind the neural networks and also the deep learning models, so what you've got then is that the cells of those recurrent neural networks become very complex. Each one of them is a deep learning model in itself so you are going to have many functions, there is going to be an input gate so now we start talking about the gates inside those cells. And then, once you've got this information that has come in then you have a special gate that we call a forget gate, so it is this one that we've got here. So, you've got an input that is based on what you have learned previously, and you decide on what you want to remember and what you want to forget. By doing this we are making sure that all the information that we want to remember is passed to the next cell but the things that we are not interested in we just forget. It is a basic operation; it is the sigmoid function that you've got there. If you multiply it by 0 you ignore it, it's gone. If you multiply it by 1 then it's there or something in between. You want to give that information and so you pass it on to the next stage.
Improvements have taken place a lot and then we moved into a sequence-to-sequence processing method, so this is another model of deep learning and again they can be used for different applications. The first one that we've got is a system that translates from English to French, so what we've got here really is a system that will start with the beginning of the English sentence that they are watching. It's going to do all the things that we have just mentioned, so it's going to look at the word and it's going to use the information that they have learned before this. They then pass this information to the next cell they read which is here and so on and so on. Then, by the time they get to the end sentence cell that we've got here, it passes it to another recurrent neural network, so we have two types of recurrent neural networks here, one is called the encoder and the other one is called the decoder. So, as soon as all this information is known and passed to the beginning of a sentence cell, the system would have (in theory) enough information to generate the first word of the French translation. And then, once we have the first word... so this is passed again to the next cell, and this is the knowledge that we have learned so far plus the word that we have just generated here that will be another input. With these two, we are going to generate the third word and so on and so on until we reach the end of the sentence, so this is how they work.
We can also use it for question-answering systems, so we do the same process as before, we read the question here which is "how are you?", and we do the same thing when it encounters the question mark which is the end of the sentence, so we have enough information for our question answering system to start answering and saying for example "I am fine".
We managed to solve this problem, but then came another problem, particularly when it came to translation. It's not a linear process, so for example, if you take French and English in this case then when you talk about adjectives in French, you say "la Maison Blanche" for example, "the white house". But if you start with "la Maison" you start with the noun and then the adjective but when you do it in English you start with the adjective and then the noun will come, so, when you are processing a sentence like this what you do is you reach a case like this and then you suspend your processing and you put a lot of attention on another part of the sentence or the text. You process it and then you come back to the rest of it. So, we are talking here about the sequence-to-sequence attention, so we pay more attention to some parts of the sentence.
So, you cannot just sit and admire what all those researchers have done and the contribution that they have made to the field I think... I’ve also worked in this and have made some contributions, so I’m working with some colleagues in trying to translate Arabic to English and then English to Arabic. When you have a language like Arabic, you don't have rich resources like what you would have for French and English. So, we have seen perfect answers for both systems, the one that translates and also the one that answers questions, but the reality is slightly different. So, with probabilities, you are going to have different answers or possible answers and then if you work towards the end of the decoder phase and if you produce sentences then we have proposed here a method that is using other features to select the ten best solutions for the translation, and then once we have those ten best, we’re going to again go and get the maximum. So, the work was quite excellent and was published in "Machine Translation". This is... I would not say the top, but this is THE journal for machine translation, this is where most research into machine translation is published.
So this will probably be the last contribution that I’m going to make tonight. I was going to make a couple of demonstrations here, but I think you've got the links there and I’m just going to mention without disconnecting and then go into Google and then try something that may not work. I’m not going to talk about Siri and Alexa, I’m sure you all know about these and I’m sure you have all used them but the most important is...I don't know how many of you are using google translate? For a person like me, I write reports in different languages and sometimes I have to translate from one language to another. I used to spend a lot of time doing this but these days you can copy and paste into google you choose your language and you've got your translation. So, I tried a piece of BBC news from just a couple of days ago and the translation from English to French was perfect. I’m not a French native speaker but I understand the language very well, and I have worked with it many times and I cannot do better than that. I tried Arabic and it was nearly perfect, and I was hoping that I would do the demonstration and possibly check with some Chinese colleagues to see whether the Chinese translation was good or not.
Everybody is saying that what Google is producing these days is near perfection. The other example is that link and if you follow it, you're going to find a text that, when you read it, you think it was written by a human being but it was not, it was generated by a computer... a full story and everything that you would expect from good writing. So, this is where we are at the moment in terms of research, so I’m going to summarize by asking a few questions.
So, go back to Alan Turing in 1950 and see whether the current systems really have passed the Turing test. The answer is certainly yes. So, look at the translations, look at the question sending systems that you've got these days, look at the text that is generated and you will not really notice a difference between what was produced by an expert or a human being and what is produced by a computer system. My second question is... remember in 1966 when reports said that machines cannot translate from one language to another? They can, 50 years later machines can do better than humans, maybe not when it comes to two languages, but that system can translate from any language to any language and humans cannot do that. And then, the question that I asked at the beginning was whether NLP has fulfilled its promises and I think it has. I think what we have seen over the last few years is amazing in the way language engineering has progressed and what we have as a result. And then, maybe the last question is whether it is the end? No, it is not. We see more and more algorithms coming out every six or seven months. Google has the dominant share on this and has put a lot of resources into this but watch the space. We are going to have more and more surprises when it comes to natural language.
So, that is the end of my talk. I would not have achieved this or what I have achieved in my career without the support of many people. I’m going to start with my family and my wife that is here today. I’ve had her support from the last couple of years through my PhD and all the way until today and she has followed me all the way to the jungle of Borneo so you cannot ask for more so thank you very much for all those years and for all your support. I’ve got three boys and that was always very competitive and very challenging, so the competition was very... I would say it say it was always a thing at home, it's always there to correct you if you pronounce the word incorrectly. So, I try to tell them that English is my fourth language you know, I have learned three different languages before I got to English and therefore there are few things that you cannot get rid of, there are some accents that I will keep with me for the whole of my career. So, they were always there to say "nope, this is how you do things, this is how you pronounce things".
And then, over the last few years, there was some adding to the family and these adding’s have brought some femininity into the family which was otherwise a male-dominated one. So, all your contributions and your help are welcome. And of course, I will never be able to thank Sunil enough for all those years. I was really very lucky to have him as a supervisor in the first instance. As he has mentioned, we were working together and the relationship has always been between two researchers working together and it was a race on who is going to finish first, from teacher to PhD supervisor to a colleague of course. We have developed a friendship over those years and also through having him as a boss. So, he was the dean and I was the associate dean, so we worked really well over all those years and I have learned a lot, not only in terms of computer science but also from the way he looks at things, from the way he manages. He has always led by example so thank you very much Sunil. That's the end of my presentation. Thank you very much for listening and I hope you have enjoyed it.
[Applause]
[Paul] Thank you, Farid. Absolutely fascinating and as a botanist, this is just so far outside my comfort zone that I’ve really learned a lot tonight. Has anybody got any questions for Farid?
[Audience Member] The sentiment that we have now is that language translation is near perfect. What do you see as the big challenges and advances for the next decade?
[Farid] I think there is the cultural aspect so, I remember when I was in Malaysia, somebody told me about this. I started learning the Mali language and it was a very simple language, by the way, you can learn it in probably a few months. And then, he told me that it is the culture that is different, so you sometimes say proverbs in one language and it's very hard to translate them into another one so, you have to find the equivalent, so bringing the cultural levels or cultural aspects into a language will be, in my view, the next challenge in natural language processing. It's very easy to translate a book, it's very easy to translate a movie, but if you have to go and live in a particular country where you haven't lived before then even if you speak the language you are going to struggle to understand things like jokes.
[Paul] I’m certainly amazed by somebody who can learn the Malaysian language in several weeks!
[laughs]
Any other questions colleagues? Louise?
[Audience member inaudible]
[Farid] I think you're right. It's frightening for many research groups, especially when you have limited resources in terms of money and so on. It's very hard to challenge, and again, for colleagues here at the university, if we really want to do the research in this area then this is the kind of investment that we need to do. Supercomputers, resources, storing them and accessing them. There is a lot of work that is done or still to be done on the theoretical side of the algorithms. Is it going to be easy? No. So, in the last bit of research that was produced, it was quite likely that the person or the students had a very strong mathematical background and that they managed to do that kind of research and contributed a bit towards the other end of the spectrum. But, if you work in that area, I agree with you that I will myself struggle to see what I’m going to do today. But I am confident that Google is not going to come up with something better tomorrow, publish it and then I will have to throw all my work away and will become obsolete. So, what we advise students to do is always... the state of the art is probably the most important, so you read something, or you read the literature which is very important to try to find the gap in that research and I’m sure there are many gaps that are still available there if you look at papers or conferences on deep learning, there are always papers that are coming out. But, once you start working, publish quickly because things change rapidly in this field and if you wait until you have an idea someone else may publish the work. But if we want to tackle this area then we need some resources in terms of computing power and in terms of attracting the right PhD students with the right background.
[Audience member inaudible]
[Farid] Yeah, I fully agree with you. So, there are areas that are going to be very difficult to translate. I have mentioned jokes, poetry is another field where... I’m lucky enough to be able to read in four languages, so when you read a poem in one language and you try to translate it, you just lose its meaning completely, particularly if there are elements of culture that are incorporated in that particular poem or text. Finding out where language stops, and culture starts is quite difficult and I’ve always thought this is where most of the work will be in the future. So, if we want machines to... it's like programming languages, we want a programming language that can be used on different computers. If we have translation systems, particularly in robots, if you take that robot from one country to another then the robot has to adapt itself to the culture of that country. There are some words that you cannot pronounce in other countries, there are some words that mean one thing one moment and they may be completely different the next. So, this is the next level in my view. This is where things are going to happen. I feel sorry for colleagues that are going to teach humanities in the future because you are probably going to have programs that can generate essays for the students, and you are not going to find out through plagiarism software or anything that the student has presented or has written because nobody has written it before but it was not the work of the student, the computer program has generated it. So, this is also something that you need to watch, particularly us as lecturers and university Professors. So, something is going to happen in the future where computers will be able to do assignments. I’m hoping that it's not going to be in computer science but I’m sure it's going to be in other areas.
[Paul] Thank you for that concern! [Laughs]
[Audience member inaudible]
[Farid] We have many, many systems that do this, so I think these were developed in the early stages where we had what we call expert systems or knowledge-based systems. So, if you can capture what an expert knows, and you put it into a computer program then a computer program can behave like an expert. If you are following the news, I think it was a couple of years ago that Google produced (using deep learning) a model that was able to detect breast cancer better than the experts. You might wonder why and it's because when you look inside an image it’s a set of numbers, so when we look at an image we may not see a difference in colours between one region and another but if you forget about what you see and you looked at it as a machine at numbers behind it then you will probably notice that one is 59 the other one is 58 and this small difference will be detected and we would request the specialist to look at that particular point in the image so we can have them to support specialists and support medicine or any other scientists to be honest with you because this can be applied for pictures of rocks, of environments and so on. So, there is always something that computers can detect that human beings cannot because we are limited on what we can see and can distinguish between the areas of an image... then yeah, they're very well used in many other fields (these kinds of models) and they have started producing results that are better than those produced by experts.
[Paul] Thank you, I think we'll leave the questions at that point um... Go on, you were very quick. Just under the wire there, sir.
[Audience member] Do you think the neural networks are actually good enough for this new era or will there be some particular shift in the future towards something new?
[Farid] I’m sure there's going to be something new. As I have said, just three or four years ago we thought that the basic deep learning models would do it and then you have different types of neural networks that have been produced to tackle different types of problems. They have been improved because what you do is you have a problem and you solve it with what you have, but if you don't have the right technology then you try to find a new algorithm and a new model to solve it and then the more you learn, the more problems you're going to find that your previous model has and so it cannot solve the problem. And then, you come out with a new model, so this is a process that has been going on for the last 10 years in the field of natural language, so we don't see any reason why this is not going to be replicated in other fields where you are going to try to do something but you find out that the technology available to you cannot do it and you start thinking about new solutions and new models. in general, this is what happens in natural language, so from the 50s people would try to do something that you don't have the tools to do, and you don't have the power to do them. So, it stays dormant for 10-15 years and then suddenly something will come out and people will jump on it and apply those technologies and those theories to get to a particular point and then we see another gap as well and so on and so forth until we get what we have today.
[Paul] Thank you very much for coming and a final well done to Farid.
The lecture recording fades out and a blue screen is presented with the University of Derby’s 3 hills logo in the middle of the screen, in white.
Inaugural Lecture Series: Professor Farid Meziane video
Back to Natural Language Processing, a fulfilled promise?