User:Neuekatze/Guaspi
Introduction to Gua\spi
Most human languages are natural: they evolved with their host societies without the benefit of intentional design. While some languages like French are maintained by dedicated stewards, and others like Spanish and Russian have been renovated recently, most drift with the fashions of peasants and teenagers which, though vital, lack logic and efficiency. A very few languages, however, have been created as artifacts with specific goals in mind. Gua\spi is one of these. My goals in building gua\spi were:
- To investigate the nature of language, and particularly the minimum content required for a language, through engineering and experiment.
- To create a language suited to use by artificial intelligences, such that the effort to map from letters to meanings does not overshadow the effort spent on using the resulting meanings.
- To create a language for my own use, free of the limitations of English, and to have fun doing so.
The purpose of this monograph is to present the syntax of gua\spi, as well as categorical information about the vocabulary, in the style of a reference manual. All the syntax is here (excepting only features under active development that were too late to make publication), and the vocabulary section contains an extensive list of how to say various types of expressions. To learn gua\spi you also need a dictionary, a language textbook, and a set of gua\spi reading material.
Artificial Intelligence
The significance of gua\spi to artificial intelligence work is that it is a bridge between humans and machines: it is complete enough to express the real conversation of real humans to each other, unlike database representation languages, yet it is simple enough that a working Prolog parser can be put together in a few days, unlike natural languages and particularly English, which must be so stripped as to be scarcely useable before a simple parser can handle it.
Gua\spi has several characteristics that particularly suit it to use by humans and artificial intelligences together.
- Gua\spi is simple. The formal syntax can be stated in a few lines, compared to thousands of lines for English. There are only eight classes of structure words (occupying only two distinct syntactic sites), with about six functionally and morphologically related words per class; a similar set of pronouns; and 21 digits. The content words number only about 1400, compared to half a million for English.
- Gua\spi is modular. Morphology, grammar, organization and semantics are defined separately and interact to the minimum feasible extent.
- Gua\spi is complete. The content words form a basis such that almost any meaning not tied to a specific place or culture, and many which are, can be represented by agglutination. Foreign words and scientific Latin are welcome in the language.
- Gua\spi is flexible. A minimum of preconceptions are imposed on the user by the language. Trials show that gua\spi can express human speech from daily life as well as highly technical scientific language.
- Gua\spi is efficient. Words are short, and extensive defaults on articles and modal cases eliminate the majority of structure words.
- Gua\spi is unambiguous. There is one sound per letter and one meaning per word; and every valid utterance can be parsed in only one way.
Ancestor Languages
The language artifact Loglan, developed by James Cooke Brown [L1], was the inspiration for gua\spi. Brown realized that a very small set of content words could form a basis of a language, and produced such a set. By successfully writing large amounts of prose in Loglan while creating almost no additional words, I validated his insight.
Loglan was the first language to have such simple grammar --- a hundred times fewer syntax rules than English, for example. But aggressive simplification can still be applied, and this I have done in gua\spi. One is tempted to think of the resulting syntax and morphology rules as trivial. A better way to describe them is, they contain the essentials and nothing more.
Gua\spi's syntax is much simpler than English or other languages, partly because the syntax is divided into modules each of which has its own purpose. Gua\spi's syntax modules are:
Morphology: How to divide letters or sounds into words.
Grammar: Joining words into phrases into sentences.
Organization: What each phrase does in the sentence.
Semantics: Giving meaning to syntactic structures.
Natural language syntax is extremely complicated because the syntax expresses actual meanings such as tenses and numbers. In gua\spi the first three levels are independent of the meaning of the words. This makes them less interesting than jewels like the "perfective aspect" of Russian or the "long object case" of Navajo, but it makes them much simpler and much easier to learn and use.
Morphology: What is a Word
The phonemes are divided in two classes, C's and V's. All C's are consonants in English and those English vowels used in gua\spi are all in the V class, hence the names. In addition each word has a tone, a frequency modulation of the V's of each word in the Chinese manner. A word is written as a tone (see Table 4 [Tones]), one or more C's and one or more V's. What could be simpler?
The Phonemes
Phonemes can be distinguished by where the tongue is placed to make them, whether they are sudden (plosive) or continuous (spirant), and whether their sound comes from the vocal cords (voiced) or the rush of air (unvoiced). Particular ranges of tongue position produce each phoneme, much like states on a map. But each listener has unique map boundaries for recognizing phonemes, especially for the vowels, so the speaker should try to hit the center of the phoneme region so as to maximize the likelihood that any particular listener will be able to recognize his speech. Nonetheless, the more difficult phoneme distinctions have been removed from gua\spi and so speakers of any natural language should find most phonemes easy to say and to hear.
Table 1 [Phonemes] shows the phonemes, categorized by tongue position and sound source. Some phonemes are represented confusingly in European languages, e.g. 'sh' which sounds like neither 's' nor 'h'. So in gua\spi they are assigned individual letters which differ from European usage --- 'q' for 'sh'. Table 2 [Pronunciation] gives examples of these, and all the vowels. Written blanks have no sound, and are optional. There is no distinction between upper and lower case.
Table 1 [Phonemes]. Gua\spi phonemes, arranged by tongue position front to back (reading across) and sound type (reading down). Letters marked '*' differ from European standard usage.
C/V | Stop Class | Sound | Labial | Dental | Palatal | Velar | Glottal |
---|---|---|---|---|---|---|---|
C | Plosive | unvoiced | p | t | c* | k | --- |
C | Plosive | voiced | b | d | j | g | :* |
C | Spirant | unvoiced | f | s | q* | --- | --- |
C | Spirant | voiced | v | z | x* | --- | #* |
V | Vowels | u | o | y | i,e | a | |
V | Nasal etc. | m | n | l | w* | r |
Gua\spi | English | Examples of Pronunciation | IPA |
---|---|---|---|
c | ch | CHew, Ciao (Italian) | t͡ʃ |
q | sh | SHoe | ʃ |
x | zh | aZure, breZHnev (Russian) | ʒ |
: | (pause) | the:apple, hawai:i (glottal stop) | ʔ |
# | uh | thE, Among (schwa) | ə |
u | u, oo | flUte, bOOt | ɪ̯u/ʊ̯u/ɨ̯u |
o | o, oa | bOne, bOAt | oʊ |
y | i | knIt | ɪ |
i | i, ee | grEEn machIne (not eye) | iː |
e | e | bEd | ɛ |
a | a | fAther (not cAt) | ɑː |
m,n,l,r | m,n,l,r | LeMoN RiNd (no silent R) | m n l r |
w | ng | stroNG | ŋ |
The sound '#' or 'uh' is common in English; all vowel letters are sometimes pronounced '#'. The 'a' of ''among" is a good example. This sound is called ''schwa"; that German name is pronounced (with gua\spi letters) ''sqv#". '#' is not used in regular words; its purpose is to break up CC pairs that a particular speaker finds hard to pronounce, since virtually all speakers will be able to handle C#C. It is to be ignored and it is only written in explanations like this one. Though normally considered a vowel, it is in the C class because it occurs among C's, and a word is defined as some C's followed by some V's.
The glottal stop ':' pronounced alone is a sudden (plosive) '#', but it is normally followed by a V so that it sounds like a brief pause after which the V comes on. In many English dialects, as in gua\spi, it is found between a vowel-final and vowel-initial word, like ''the:apple", while the Cockney dialect uses it much more extensively. The glottal stop is not used in regular words; its place is at the beginning of each sentence start word, and in vowel-initial foreign words.
English has thirteen subtly different vowels plus four official diphthongs but only five letters to represent them. Gua\spi uses only six easily distinguished vowel sounds, recruiting Y for one of them, and adds some vowel-like sounds which are considered consonants in English. Unfortunately, many regional accents of English turn simple vowels into diphthongs, invalidating the example words given in Table 2 [Pronuncation]. Other accents transform sounds beyond the bounds that a gua\spi speaker can recognize. If you speak with a regional accent, please use the vowel sounds that you can hear on television or radio (American or British will both work). Particularly troublesome examples, rendered with gua\spi letters, are shown in this table:
Standard | Accented English | |
flUte | flIUte | |
bOAt | bAt | (a very closed 'o') |
machIne | machI#n | |
bEd | bAI#d | |
fAtheR | f%thA | (% represents 'a' in ''cat") |
apanese speakers are famous for producing 'l' and 'r' that Europeans cannot distinguish. Chinese has distinct 'l' and 'r' but uses phoneme boundaries different from the European norm, so its speakers also have some trouble being understood. The gua\spi 'l' and 'r' are biased to European norms, and Asian speakers should take special care with these phonemes.
Preliminary experience shows that the errors English-speaking beginners make most often are to interchange 'q' with 'c', 'x' with 'j', and 'i' with 'y'; and to pronounce 'w' as 'oo' (should be 'ng').
Written blanks have no sound, and are optional. In this document a blank usually comes before each word (except in the phrase ''gua\spi"), although in running text it looks nicer to omit blanks before the tone '-'. There is no distinction between upper and lower case. The tones (described next) make punctuation unnecessary. There are no periods at the ends of sentences; however, each sentence start word begins with a glottal stop, written as a colon. This colon is a letter, not a punctuation.
A feature of gua\spi (like Loglan before it, and unlike English) is that writing and speech are isomorphic, that is, each letter has a single phoneme (sound) and each phoneme has a single letter (with trivial exceptions), so that each spoken text can be spelled easily and without ambiguity, and each written text can be read off equally easily.
Grammar by Tones
The job of grammar is to stick words together into phrases. The grammar does not support meaning of any kind --- no tenses, no possessives, no nouns, no verbs. These ideas are handled at the organizational and semantic levels, using the grammar as a foundation. Like its morphology, the grammar of gua\spi is nearly minimal.
Parse Tree
The grammar is stated in Backus-Naur form in Section [Backus]. For grammatical purposes there is only one kind of phrase (though distinctions are made at the organizational level), but words have five categories: the two words ''fu" and ''fi", sentence start words, other prefixes, and everything else. The main part of a phrase is a sequence of one or more words collectively called the ''phrase predicate"; any prefixes in this must come first. After any of the prefixes or after the whole predicate the sub-phrases are interspersed. They, of course, have their own prefixes, predicates and sub-phrases.
Let us understand phrases with the help of the example in the following figure, showing the ''parse tree" of a simple sentence. The root phrase is at the top; parse trees grow upside down. Sub-phrases with their own predicates come at the next lower level. These in turn may have their own sub-phrases. Each phrase is at a certain level and it attaches to the most recent phrase at the next higher level. The tones (see Table 4 [Tones]) show the level of each word relative to the one before it.