Can artificial intelligence speak "foreign languages"? -On Natural Language and Computer Language

Language is an important tool for human beings to express their communication and thinking. Language reflects people’s thoughts, and people use language to communicate, express and create. The daily language used by human beings is called natural language, which includes vocabulary, pronunciation, semantics and grammar. The language used for machine programming is called computer language, which consists of numbers, characters and grammatical rules. Human natural languages are in different languages and belong to different language families, and computer languages are also divided into various categories according to different writing rules. Language intelligence, the ability of people to communicate and think in writing through language dialogue, is regarded as an aspect of human intelligence. In contrast, a machine (computer) has the ability to recognize computer language, run programs and perform operations, and computer language can be regarded as a language inherent in the machine. If the computer can understand the natural language as a "foreign language", realize the translation of natural language, understand and execute people’s commands, and even talk to people, can it be considered that the computer has language intelligence similar to that of people? This paper will answer this question.

Definition of natural language and computer language

The language used in human daily life is called natural language. Humans use language to express ideas, communicate and describe objects. As a carrier, language carries the connotation of words. Many languages of human beings belong to different language families, but languages can correspond and translate with each other. Every language is a system including vocabulary, grammar and pronunciation, in which vocabulary is used to express itself according to grammatical rules. As the philosopher and linguist Chomsky said in Language and Mind, "People who know a certain language have mastered a set of rules system, which assigns sounds and meanings to countless possible sentences in a certain way." However, individuals who use natural language may not be aware of the rules and connotations of this language system: "People who know the language are not aware that they have mastered these rules or are using them, and there is no reason to assume that this knowledge of language rules can be brought into consciousness."

The acquisition of natural language is based on life and social interaction. The process for children to master a language is to get in touch with words and sentences through a large number of dialogues, acquire sentence models in conversations and chats, and then master grammar. It can be seen that the learning of natural language is from semantics to grammar. Therefore, some scholars discuss language acquisition from the standpoint of human language talent theory. For example, Chomsky thinks that there is a natural universal grammar in the human brain, which exists in a place called language acquisition device. He tried to explain the relationship between language research and human nature, and thought that language reflected the process of human mind and determined all aspects of ideological characteristics and development.

From the perspective of language system, natural language is both stable and changeable. On the premise that a language exists, its vocabulary and basic grammar are stable. At the same time, the language is constantly updated, derived and developed due to the changes of the times and environment during its inheritance and use.

Computer language refers to the language that uses numbers and characters to write programs according to the prescribed grammatical rules, so that computers can do all kinds of work. It includes machine language, assembly language, high-level language and so on. In different ways of thinking about programming and programming models, statements composed of characters and grammatical rules perform operations according to instructions. The original intention of computer language design is to enable people to better control and operate computers. At present, every action and step made by all computers are carried out according to the program compiled by computer language. The operation of a computer is a process of accepting input, matching objects and outputting answers. In the execution operation, the calculator must first understand the input commands of people, and translate the natural language. The source code of the application program is translated into the target code machine language by the interpreter of the corresponding language, and then the compilation operation is carried out to translate the program source code into the target code machine language.

It can be said that computer language is the medium of man-machine dialogue. Computer language can recognize and translate natural language, perform operations and output results in its construction, and the translation process is the key to man-machine cooperation and even man-machine dialogue. The machine can only continue to perform operations after understanding the commands input by human beings. Before the emergence of intelligent machines, people’s input instructions to machines were single, but today’s intelligent machines have gradually tried to understand people’s commands through voice and image recognition. At present, this ability mainly depends on corpus analysis, enhanced matching search and deep learning.

If the vocabulary of natural language is regarded as a set of symbols and its grammar is regarded as the rules for the use of symbols, then computer language is also the use of symbols and rules. The encoding and decoding of computer input and output, as well as the operation in human-computer interaction, are also similar to the process of using language to listen, read, write and communicate with people. In this sense, computer language seems to be the natural language of computers. If the computer is truly intelligent, then in its view, human natural language can be regarded as a foreign language. But can computers be intelligent?

To compare natural language with computer language, an important perspective is to distinguish the internal logic of natural language and computer language from their logical systems. Logic is an important connotation of language, and the logical structure and grammatical system of a language are the root causes that distinguish it from other languages and languages.

Logic in Natural Language and Computer Language

In the use of natural language, ambiguity and ambiguity are inevitable, and there are misunderstandings and deviations in translation between languages. Scholars want to solve the ambiguity and ambiguity of words in expression and find the common deep structure in human language, so they invented formal language by using mathematical methods for reference. The original intention of formal language is that people want to make logic have a set of universal symbols like mathematics, so as to try to establish a universal and unambiguous language. Through this language, all thinking and reasoning can be transformed into calculus and become as accurate as mathematics. Logicians try to accurately describe the world described by natural language with formal systems and symbols, so as to make more accurate reasoning, analysis and judgment. Therefore, in the study of logical language, it also includes the study of natural language generative grammar.

Natural language can be regarded as a symbol system, in which words are symbols to express ideas under the cultural background. However, the logic in natural language is not only grammatical structure, but also semantic. The ambiguity and vagueness of some words in our daily expression are difficult to judge in grammar, but they can be simply solved by context and the introduction of context. Wittgenstein thinks that natural language is used to express objects, the meaning of a proposition or sentence comes from its real object, and the process of learning language is to master the relationship between words and objects. We can’t get meaning from a single word, but through association and activity, so natural language has meaning under certain environment and specific rules of the game.

Therefore, the logic of natural language has one more dimension than that of formal language. Natural language is a part of human mind, and the logic contained in words is the unique logical ability of human mind. As Wittgenstein pointed out, a single vocabulary and grammatical structure cannot convey complete meaning. The acquisition of natural language can’t be achieved only by learning the logical structure of language. Therefore, mind is an important concept in the acquisition of natural language.

The computer language used by the machine is also a formal language. It is the language that people first gave to the machine and became the pre-existing language in it. When computers understand and execute human commands, they need to translate natural languages into computer languages that can be understood by machines, and then carry out program operations. Formal description of natural language is very important for mechanical imitation of computer programs, but understanding imitation is different from mechanical imitation. Mechanical imitation involves formal nature, while comprehension imitation involves quasi-semantic nature. At present, computers are mainly based on mechanical imitation and talk to people’s natural language through logical language. Therefore, although the processing of natural language by computer can be regarded as a kind of translation, at present, this kind of translation is different from the mutual translation between two languages in natural language.

In the process of disambiguation, computers need a lot of knowledge, including linguistic knowledge (morphology, syntax, semantics, context, etc.) and common sense cognition about the world. This has also caused two main difficulties in natural language processing at present. Measuring computer language from the perspective of natural language is highly formalized, which makes its ability to describe context limited, and it cannot convey multi-layer information like natural language. Therefore, although formal language has advantages in accuracy, its contextual shaping ability and expressive power are inevitably weaker than natural language. In the understanding of natural language, it is difficult for formal language to fully describe how lexical devices are related to syntactic structures, thus forming the meaning of sentences, which is also an important reason for errors in computer recognition of natural language. Of course, with the development of corpus construction and corpus linguistics, the rationalism method based on syntactic-semantic rules, which was mainly used by computers in dealing with natural languages, has taken a back seat. Nowadays, the natural language processing technology has introduced statistical mathematics methods, and gradually reduced errors with the support of matching search and automatic learning methods.

Cognitive science holds that thinking and cognition are logical operations of knowledge, while computerized natural language analysis mainly depends on the expression of logical language. From the perspective of behaviorism, if the machine has a computer language, and it is constantly strengthened and more accurate in the operation execution and use the day after tomorrow, it is a kind of acquisition and consolidation. In this sense, computer language seems to be its natural language for computers. However, although natural language and computer language each contain many kinds, the reasons for their diversity are different. Different from the differences of history, culture and region, computer language is developed for different needs when writing programs. Therefore, the connotation of natural language and computer language is different. Fundamentally speaking, the mind embodied by natural language is different from that expressed by computer language.

Natural language and mind

If natural language is the tool of human expression and the medium of thinking, then the relationship between natural language and mind is inseparable. To a certain extent, people’s ability to use language is one of the manifestations of mental ability. Mind is different from intelligence. Mind is a part of intelligence, which refers to people’s abilities of perception, feeling, memory, learning, understanding and innovation.

Mental ability includes the understanding of natural language. When using natural language in daily life, we can get the pronunciation, image and connotation of the described object at the same time. In our daily life, whenever we hear someone describe something or see someone pointing to an object, we will remember what this thing is called and say the same words when we refer to that thing again in the future. We can also experience all kinds of emotions and feelings in the sentence from other people’s voices, movements, expressions and eyes, as well as body movements, tones and tones. That is to say, in natural language, the meaning of words is not limited to the meaning of a single word, but also includes the appearance of the referent, pronunciation, the intention of the narrator and the context at that time. The meanings of natural language are fully embodied in social communication and dialogue. A comprehensive grasp of these meanings requires mental ability, and vice versa. As Chomsky said: the surface rule of words forming sentences is grammar, but the real meaning of sentences is reflected in the deep structure. The deep structure is related to the surface structure through some mental operations. In other words, the connection between all words, sounds and meanings of natural language is based on the grammatical structure on the basis of mind.

Furthermore, the study of natural language is also a reflection of mental ability. Psychologists represented by Chomsky believe that language and grammatical structure are the essence and characteristics of human mind. No matter how different language users are in individual experience and ability, they will complete the task of constructing language theory system in a very similar way. We are endowed with cognitive structure and language ability in our minds, and gradually strengthen our grasp of grammar rules in the application of acquired knowledge. "On a basic level, we humans are not learning languages. The real situation is that languages grow up in our minds."

Different from Chomsky, Quine’s language theory is based on empiricism and behaviorism. He opposes both the thorough empiricism reductionism and the pure transcendental knowledge. He believes that the reason why language can express meaning depends on the acquisition of behavior, and people’s mental ability embodied in language behavior is gradually acquired. The mental ability to master and use a language is acquired and can be strengthened through training. In Quine’s view, our ability to master language comes from public knowledge, which is a kind of inheritance of human common experience and knowledge background, rather than a priori existence: "Even if we want to talk about a unique quality of sensory awareness, most of us have to turn to public objects, the color of oranges, the taste of rotten eggs, and so on. To continuously access the previous sensory data, it also depends on the reference object. Of course, we should explore the sensory awareness and sensory stimulation behind the daily discourse about objects, but these are the background of concept formation or language, not their underlying structure. "

According to Chomsky’s theory, the initial inner nature that we give to an intelligent computer can be counted as its innate "mind", of course, such a mind is given by people and is incomplete. According to Quine’s theory, under the premise of being constructed, the computer has acquired the knowledge background and can continuously strengthen its language ability in learning, which seems to mean that it is possible to obtain real intelligence in intensive learning, including the same level of language intelligence as people.

Machine language and mind

In the famous Turing test of artificial intelligence, if a person can have a long enough conversation with a machine, if the interlocutor can’t tell whether the answer comes from a machine or a person, then it is judged that the machine is intelligent. It can be seen that the main ability of Turing’s intelligent computer is the ability to understand and use language. The test defines the intelligence of the machine as several aspects: the machine should be able to answer the questions of input text; Be able to explain the meaning of words; Able to understand sentences composed of words; Be able to translate one language into another. Language ability is an important criterion to judge whether a computer has intelligence. It can even be said that from the mechanism of Turing test, language ability can be equated with intelligence.

If intelligence must master language ability, as mentioned above, although the natural language used by human beings has vague and ambiguous contents, its expressive power is indeed stronger than that of formal language. If a computer wants to have real intelligence, it must have the ability to understand natural language in different contexts. However, this ability is almost impossible in the existing technology. Dreyfus mentioned in What Computers Can’t Do that one of the difficult problems for machines to acquire intelligence is to solve the ambiguity in language. Although there is now the support of enhanced matching search and big data, the ambiguity problem of natural language has been partially solved on the surface. However, if we can really understand and use a language by referring to the way we use natural language, we need to master the necessary reasoning rules (including expert knowledge reasoning and common sense reasoning), but also have the ability to understand and comprehend the context. With this ability, you can be considered to have a mind and intelligence.

When discussing machine intelligence and mind, many researchers hope to get inspiration from the generation and structure of human intelligence. From simple development to complex thinking, human intelligence seems to have rules to follow, but if you consider it carefully, even the simplest and most elementary intelligence involves the cooperation of millions of brain cells and muscle cells on a large scale. These simple behaviors are embedded with too much intelligence, which are deep mental abilities that are not realized in daily speech and behavior. Language intelligence, as one of them, if we consider the source of language mechanism and its role in the sudden qualitative change of human intelligence, at least two basic problems will appear: first, the core semantic content of the smallest meaningful element, including the simplest element. Second, the principle of allowing unlimited combinations of symbols. If we want to analyze the physiological structure and logical structure produced by human mind step by step, we will try to endow the machine with mind by imitating this structure. It is bound to face two dilemmas: first, the human brain is a black box that cannot be completely mastered; Second, the mind, as a non-physical but real existence, is an elusive existence for technology.

From the standpoint of mentalism, it is impossible for a computer to have the same mental ability as human beings, although it can acquire more sophisticated and accurate natural language processing technology with the development of technology, and it can understand all instructions of natural language in operation and execution. From the standpoint of behaviorism, if a machine has the same language ability as human beings and can understand and express natural language, it seems that it can be considered as a machine with mind. However, the inevitable problems in the transformation between natural language and formal language also indicate that imitation based on behaviorism cannot be exactly the same as human beings.

conclusion

With the development of technology, the number of natural language texts that can be processed by computers has been increasing. Under the application requirements of text mining, information extraction, cross-language information processing, human-computer interaction and so on, the research on computer natural language processing is also advancing. However, natural language is directional and logical in description and expression, and conveys and carries culture. This means that natural language is not only a combination of some words and symbols, but also different from formal language. Although natural language is also used under certain logical grammar and has the functions of communication, exchange, expression and creation, it also has cultural attributes. The acquisition of a language not only includes the use of words and grammatical rules, but also includes the understanding and recognition of a culture. At this level, no matter how much the existing intelligent machines deal with natural languages, they have not yet possessed real intelligence. Therefore, we can take the normal use of language as extremely clear empirical evidence to prove that other creatures have the same mind as us, but not as discriminatory evidence of the source of mind and human ability.

Descartes said that "language is the core symbol of human thinking", which means that language ability and human thinking ability are inseparable and language ability is the representation of human mind. He also pointed out that two important abilities of human mind, understanding and will, cannot be realized by machines (automata). Because the mind has no substance, it is impossible for automata to create the mind by imitating the surface structure anyway. Descartes’ prediction is still correct today, although it lacks the integrity of argument. The use, expression and creation of language is an important content of human intelligence. If the intelligent machine in the future will have the same or even stronger intelligence as human beings, then it must have the same understanding and application ability as human language ability. This can’t be done at the existing computer language level. As for whether machines can learn "foreign languages" by new breakthrough technologies in the future, that is another question.

References:

Dreyfus. What Computers Can’t Do: The Limits of Artificial Intelligence [M]. Joint Publishing Company, 1986.

Noam Chomsky. Language and Mind [M]. Renmin University of China Press, 2015.

Selected Works of Noam Chomsky Chomsky’s Philosophy of Language [M]. The Commercial Press, 1992.

Noam Chomsky. Some Problems of Syntactic Theory [M]. China Social Sciences Press, 1986.

Quine. Words and Objects [M]. Renmin University of China Press, 2012.

Wittgenstein. Philosophical Studies [M]. The Commercial Press, 2000.

Frege. Selected Works of Frege’s Philosophy [M]. The Commercial Press, 2006.