Marvin Minsky asked his first-year undergraduate student to connect his TV with a computer and make the computer to describe what it sees. At first glance this is an easy task – even children would be able to do it. However, this is a problem that remains unresolved even forty years from the first attempt. Machines have a hard time identifying more than just pixel patterns. For example, the AlexNet algorithm, based on convolutional neural network (CNN) architecture, is very successful in recognizing objects and human faces, however, by only changing a few pixels in the image it can be easily fooled and the prediction accuracy is drastically reduced. What is more surprising is that humans are not able to distinguish between two images, but that’s sufficient to fool the algorithm. According to Mitchell, some scientists call this “an intriguing property” of neural networks, but she says it’s the same as calling a hole in the hull of a fancy cruise liner “a thought-provoking-facet of the ship” and it’s hard to disagree.
An AI similarly fails in other domains, such as language where it does a great job in translating language and keeping the meaning intact, but only if it consists of a few sentences. Mitchell argues that one of the reasons why machines still cannot pass the Turing test is because they cannot grasp the actual meaning behind the words. In addition, she suggests that the better test could consist of the Winograd schemas where the machines would be required to understand the language, instead of using any tricks like changing the topic during the Turing test. The Winograd schemas use sentences that are dependent on the context i.e. you have to have a common understanding of the context in order to fully grasp the meaning of the sentence. A famous example could be this – “The city council refused the demonstrators a permit because they feared violence” and “The city council refused the demonstrators a permit because they advocated violence”. The question is “who are they?” in each of those situations. Until AI cannot understand what ‘they’ or ‘it’ means in a sentence, it’s difficult to believe that it can reach human-like intelligence.

Moreover, even if we solve already mentioned issues, then we have a few philosophical debates left. Currently, we have over 70 different definitions of intelligence and, as Joshua Greene says, before we can place any values or norms into machines, we ourselves need to figure them out to make them clear and consistent. This is not a new problem since Norbert Wiener in 1960 had similar concerns when he said that we had better be quite sure that the purpose put into the machine is the purpose which we really desire”. But how do we know what we actually desire or what kind of values and rules we want to implant into the machines? Can we actually make any of these values consistent since we have such diverse societies and cultural and even religious norms? One of the ideas on how to solve this problem is to allow the machines to observe human behavior and figure it out for themselves. However, in that case, they have to be placed into our environment, and it cannot be preserved in the box as it is now. Should it have a physical body? Be equipped with some basic instincts and emotions as we, humans, are? As Mitchel mentions, Deep Blue beat Kasparov in playing chess but it didn’t get any joy out of it. So, even if machines have a physical body, would they be able to understand and recognize our, human, experiences without their own senses and emotions – is it something that makes our understanding of the environment richer, or is that unnecessary?
In short, there are many unresolved problems in the AI field and Mitchell argues that we are far, if at all possible, from making truly intelligent machines.
