Grammar in Foreign Languages

Unlike what you would see in many works of fiction, languages of the real world can work in wildly different ways, enough to make them sound like Starfish Language to a non-native. In fact, for every property that has ever been proposed as a "universal" characteristic of human language, there is at least one known non-artificial human language that doesn't have it, or has its exact opposite.

Western audiences and authors generally find the Indo-European language family the most familiar in terms of grammar and vocabulary. This family includes most (but not all) of the languages spoken in modern Europe (already quite diverse; compare Russian to English to Italian) but also roughly half of the many languages spoken in India and what used to be called the "Near East" (Turkey, Persia, etc). And Indo-European is only one of dozens of such families. Wikipedia has more details.

Real human languages very often differ from what Benjamin Whorf has called "Standard Average European" in that they can:

Lack articles such as a, an, or the, such as Russian and Latin (IE), and Japanese and Chinese (non-IE).
- Have definite articles but no indefinite articles, such as Irish and Icelandic (both IE), and Esperanto (a Con Lang based mostly on IE languages).
- Have finicky rules about when things can be definite or indefinite (Literary Arabic: not "a leader of the community," but rather "one leader among the leaders of the community")
- Place articles after the word modified instead of before (Romanian and the Scandinavian languages have an enclitic definite article, while the Romanian indefinite article follows rules closer to English).
- Have many more articles than English. German articles change according to gender, number, and case of the noun, resulting in 16 possible combinations, although only expressed through 6 forms each for the definite article.
Have no direct or single equivalent of verbs like 'to be', 'to have', or 'to do' which are kind of a defining feature of IE languages. It's often not just non-Indo-European languages. Irish and most modern Romance languages have two copulas ('be') (one of the Romance ones usually deriving from the Latin word for "to stand"). Irish and Russian have no auxiliary verb "have". ^[1]
Do not mark nouns for number (Japanese), or, alternatively, have more number markers than simply singular and plural. Many languages have separate dual or even trial ('three') numbers. There is even at least one language that has marks for zero (I have no cookies), fractional (I have half of a cookie), singular (I have one cookie), dual (I have two cookies), paucal (I have a few cookies), and large-scale plural (I have lots of cookies)! Most Indo-European languages have lost their duals; Sanskrit, Ancient Greek, and Old Church Slavonic had them, and there are still traces of them in some of the Slavic languages. English's use of the word both (rather than *all two) may be a remnant of this as well. Latin also had one, which survived in the irregular declension of the word "duo", while Slovene still makes full use of it. Old English possessed the vestiges of a dual, but only in the pronouns. Come Middle English, this dual number was gone.
Have a more limited set of cardinal numbers -- the so-called "one-two-many" phenomenon, although some languages may hit "many" at a point other than three. Note that this does not necessarily prevent accurate counting above "many"; it may just change the nomenclature. The Trolls of the Discworld, for instance, have a cardinality based on powers of 4: "one" (1), "two" (2), "three" (3), "many" (4) and "lots" (16), which can then be combined to express other quantities (like English does for concepts like "twenty-one" and "one hundred fifty-two"). Then again, a culture that is truly innumerate may not be able to distinguish between different quantities of "many".
- Conversely, linguistic evidence suggests that many languages started out with "one-two-many" cardinals before gaining more terms for numbers above two; one of the telling pieces of such evidence is that the first two ordinal numbers in most languages ("first" and "second", in English) are not related to their corresponding cardinals ("one" and "two"), whereas ordinals for three and above ("third", "fourth", etc.) are clearly constructed from their cardinals. An alien language might well go further into the ordinals before one encounters the first ordinal derived from a cardinal, suggesting a larger range of early numeracy than humanity generally demonstrated. Or it might never derive ordinals from cardinals at all, suggesting a race with an inherent grasp of mathematical concepts.
Have nouns with grammatical gender. French has two (masculine and feminine), German has three (masculine, feminine, neuter), and some languages assign "gender" according to whether the topic of the subject is visible, known to be near, or far away. Some languages have a simple animate vs. inanimate. Some confusingly combine these (e.g. Arabic, which arbitrarily divides non-human objects into masculine and feminine, and proceeds to ignore that division by making all inanimate plurals "singular feminine"; Unfortunate Implications aside, it's really confusing). Other languages differentiates gender by properties of the noun, Swahili has a different gender for people, animals, tools, liquids and so on. Or alternatively, are more gender-neutral than English, like the Uralic Languages. Imagine having "he" and "she" be the same word, as well as "him" and "her."
Mark verbs for categories that English doesn't, such as voice, aspect, mood, and so on. Or don't mark verbs for categories that English does; Mandarin Chinese has no tense, and conveys temporal information through aspect, instead.
Differentiate between the inclusive and exclusive 'we'. Compare the English, "We will beat them." to "We do not like you." The inclusive includes the person being addressed, while the exclusive does not.
Have a different concept of "word" than what you expect. There is no agreement among linguists on what constitutes a "word", or even on whether there is a universal concept of "word" that can be applied to all languages. Again, Japanese provides an example -- are the particles (wa, ga, o, etc) part of the word or separate words themselves? Most linguists say they're separate, but there's no shortage of transliterations that don't have a space there. (Japanese itself avoids the issue by not having spaces between words at all.)
Are ergative-absolutive instead of nominative-accusative. Take two similar sentences that differ in verb transitivity (such as 'He slept.' and 'She ate them.'). A nominative-accusative language (like English) case-marks the subjects 'he' and 'she' the same in both sentences (that is, as 'he'/'she', the nominative case, instead of as 'him'/'her', the accusative case) and case-marks the object 'them' (perhaps some apples?) in the accusative (as opposed to in the nominative 'they'). In an ergative-absolutive language, the subject of the intransitive sentence 'he' would be case-marked the same as the object of the transitive sentence 'them' -- in the absolutive case. The ergative case only shows up marking the subject of the transitive sentence 'she'. Several Indo-Iranian languages such as Kurdish and Hindi are ergative. They appear to have borrowed ergativity from neighbouring languages like the Dravidian languages, the Caucasian languages, etc.
- There are a lot of different kinds of morphosyntactic alignment, besides nominative-accusative and ergative-absolutive. Some languages are transitive, marking both the subject and object of a transitive sentence the same, but the subject of an intransitive sentence differently. Some are tripartite (marking the subject of a transitive sentence, the subject of an intransitive sentence, and the object of a transitive sentence all differently). Some are various kinds of active-stative (marking subject case based on whether or not the subject actively does something, so case marking is dependent on the meaning of the verb rather than grammar), and then there's "Austronesian alignment", which is, well, very confusing.
Have wildly different syntax (word order). English generally places the subject of a sentence first, the verb second, and the object last, a very common word order. However, in just as many languages, the subject is placed first, the object second, and the verb last. A minority of languages even do things like place the verb or the object first, the subject last, or any other possible combination. Some languages, usually those that are highly inflected, don't even have a hard and fast word order at all. Latin, for instance, generally prefers SOV outside of poetry, but is so inflected that the word order can be changed without changing the meaning of the sentence. The old forms of Semitic languages (like Classical Arabic and Biblical Hebrew) historically preferred VSO, but left SVO as an option because of their inflection--the latter of which became dominant in the contemporary colloquial forms.
Then there's the question of whether to put adjectives before or after the words that they modify, where to put determinators, what types of clauses or sentences change word order, how to construct relative clauses, etc.
Are not nearly-isolating languages like English, where word use is determined by position, and there are lots of particles -- small words with purely grammatical functions (like English prepositions). Some languages, like Japanese and Turkish, are agglutinative, where word use and other such markers are affixes that combine in a string. Some languages, like Latin and its descendants, are fusional, where word use and other morphemes are marked by affixes that are all mutually exclusive (so there's one affix in Latin where Turkish might have a string of three or four, but you need a completely different affix in Latin for a small change in meaning, while Turkish can just switch out one of its affixes). Agglutinative languages are rather famous for their ability to cram very large amounts of information onto single words. For example, in Hungarian, the common toast "Egészségünkre!" is literally "To our health!"; a phrase which takes three words to say in English, but in Hungarian, one word does the job. Some languages really take the ball and run with it -- in Inuit, "he said he wouldn't be able to arrive first" is "tikitqaagminaitnigaa," while in Yaghan, "the look shared by two people too shy to do anything about it" is "mamihlapinatapai." It gets even worse when you get to polysynthetic languages, where several distinct words get mashed together: archaic Ainu "usaopuspe aejajkotujmasiramsujpa" means "I keep swaying my heart afar and toward myself over various rumors."
Or perhaps they're more isolating than English is. Plurals and past tense forms may be expressed using distinct words that in some cases can be used alone: "did walk" instead of "walked", with "did" alone as a possible answer to a question. Chinese, for instance, has one morpheme per syllable and close to one morpheme per word.
Have adjectives that act like verbs instead of or along with acting like nouns (kind of). For example, some Japanese adjectives can be conjugated just like verbs -- shirokunakatta ie = the house that was not white (white-NEG.PAST house). Sometimes this situation is described as "the language has no adjectives," which confuses the uninitiated -- what is meant is not that the language doesn't have words like "red" or "large," but rather that words like that follow the same rules as verbs.
- The Wolof language of Senegal conjugates pronouns. Maa ngi dem means "I am going" or "I go." Dinaa dem means "I will go [soon]." In this case, dem is the verb (go), and cannot be changed. Maa ngi and dinaa are both pronouns.
Have prepositions that can be used independently as verbs, or rather, have verbal grammar such that subordinate verb phrases are used when English would use prepositional phrases. In a language with such coverbs, one word may serve as the verb "go" and the preposition "toward".
Use noun cases to convey the same meaning as English prepositions. In Finnish, for instance, there are fifteen distinct noun cases (kind of makes the three in English look simple, doesn't it?) to express various different meanings, but the use of prepositions is severely limited. For example, "talo" means "house," but "talossa" means "in the house," "talolla" means "at the house," "taloksi" means "(transform) into a house," etc.
Differentiate between alienable and inalienable possession: "my wrist" is "wrist of me", but "my watch" is "watch on me".
Have something other than two degrees of demonstratives -- English has just this and that (but it used to have yon[der] as a third, and the other is commonly used as a third but decidedly less standard), Japanese has three (kore, sore, are), some languages have one, some have as many as five. Alaskan Yup'ik has thirty. They are sorted by five layers of location, three layers of visibility and two layers of accessibility. So for example one demonstrative means "partially visible 'that,' near and accessible to the listener but not necessarily to the speaker." Another demonstrative means "completely visible 'that' which is above the speaker and inaccessible to him/her."
- German, by contrast, has only one used in common speech, dies-. Technically there is a second, jen-, cognate with English yon--and used just about as frequently.
Mark the relationship between speaker and audience (register), and occasionally also between speaker and subject, whether through pronouns or verb forms or sentence markers. Most Indo-European languages have this, actually; for example, in French there's 'tu' (informal) and 'vous' (formal). English is one of the few IE languages that doesn't do this, although it used to and a few dialects still do. Some languages get very elaborate; Japanese marks for formal/informal, plain/polite, and humble/honorific, in any combination of the three (though formal/informal are pretty similar). Korean has about seven degrees of politeness and formality, each of which also has a humble and an honorific form—though a few of them aren't used much anymore.
Have words that don't directly and perfectly translate into English. Sure, there can be some of the whole "showing culture through vocabulary" thing, but also more mundane instances -- for example, English divides temperature into cold, cool, warm and hot, but other languages may have only two or three of those, or maybe more.
- Similarly, many non-English languages divide up colors differently from the Western standard "ROY G. BIV", with some having as few as just two basic colors (black and white). Quite a few make no distinction at all between blue and green. On the other hand, some Asian languages have dozens if not hundreds of distinct color names. An author writing a race with a different visual range from humans (such as demihumans from D&D, who frequently possess vision in the infrared range) may forget to create terms for colors humans can't see at all, not even "squant" or "octarine".
- Other languages may also have fundamentally different conceptual metaphors. For example, while in most languages the past is "behind" us and the future lies "in front" of us, in Quechua and Aymara it is the other way round.^[2] idea Rather than likening the passage of time to the ego's journey from the past toward the future these languages liken it to a movement of events in a queue -- the events of the future are lined up behind the events that have already occurred (this metaphor is also present in English and other languages with words like "before" and "after", but it is only used to relate events to other events, when the ego is not involved).
Lack relative constructions ("the one that does X" etc.), and have to substitute adjective phrases ("the X-doing one"), or have correlatives: "This is the man who my wife has been sleeping with him!"
- In Romance languages, the opposite is true; there is no adjective phrase with verbs. To say "The talking dog" in French, one must say "The dog that talks." (Le chien qui parle.)
  - Actually, even in Romance languages adjective phrases exist: in French for example, one could as well say "Le chien parlant" ("The talking dog"). It's true that English has way more occurrences of those, though, as many of them can only be translated with relative constructions.
Treat relative clauses like adjectives. For example, in Mandarin Chinese, using the attributive particle de, one can just as easily say "red de car" as "drives down the street de car," using actual Chinese words of course. The former would simply be "red car," but the latter would have to be translated as "the car driving down the street."
Are topic promotional instead of subject promotional (Japanese). In English, the subject is understood to be the topic of the sentence (which the passive voice facilitates). In Japanese, topic and subject do not have to be the same.
Have no element in a sentence that corresponds straightforwardly to what Europeans would call the "subject." The topic-promotional Japanese -wa is a good example, as are dozens of academic papers in Linguistics debating whether sentences in Tagalog (the most common language of the Philippines) can be properly said to have subjects or not. (Short version: the properties that a subject has in English can often be split up between two noun phrases, the "topic" and the "agent", in other languages.)
Is written using logograms (Chinese)^[3], abjads (Arabic, Hebrew)^[4], syllabaries (Inuktitut)^[5], abugida (the languages of India and Ethiopia)^[6], or a hodgepodge of everything (ancient Egyptian and modern Japanese), instead of an alphabetic writing system. And not all writing system include the concepts of upper and lower case, cursive writing and/or punctuation.
Use different methods for dividing words other than spaces. Many, such as Japanese and Chinese, have no divisions at all. Other options include interpuncts (Classical Latin), special characters at the beginnings of words (Hebrew), or even elevating the first character in each new word (Persian). German is also famous for not having spaces in its noun compounds -- though in reality, these compounds are grammatically more or less the same as English phrases like magical girl anime fan; the main difference is orthography (where you put spaces in writing), not grammar proper.
Possess writing directionalities different from the most common left-to-right and top-to-bottom, such as right-to-left and top-to-bottom (Arabic, Hebrew), left-to-right in vertical lines that run from top to bottom first (Mongolian, Uyghur), or even right-to-left in vertical lines (Chinese, Japanese). Beyond that would be boustrophedon (changing direction with each line), which while common in antiquity is used by no (natural) modern language. Then there are languages that can be written in multiple ways, or are leaning more towards left-to-right and top-to-bottom as a result of western influence.
Follow a different syllabic stress pattern than English. A case in point: when faced with an unfamiliar word of more than two syllables, English speakers tend to stress the next-to-last syllable, with a secondary stress on the second syllable prior to that, if the word is long enough. Other languages may prefer other stress patterns. Japanese, for example, often stresses the second syllable in a three-syllable word but nearly as often the first. The family name "Tanaka" is usually pronounced by English speakers unfamiliar with it as "tah-NAH-kah", following an analogy to some other Japanese words stressed on the second syllable, but in the original Japanese it is "TAH-nah-kah". Word stress patterns are particularly in-ground habits, and it is sometimes quite difficult to adapt to a different language's "defaults"; writers creating a language will rarely choose stress patterns they find difficult or "unnatural".
Use pitch and changes thereof as elements of meaning in words. While Mandarin Chinese is the most famous example, numerous African languages also possess this property, where changing the pitch at which you pronounce a set of phonemes can completely change the meaning of those phonemes.
Form compound nouns differently. Most language put the base noun at the back, but there are languages which put it at the front. As an example, control CENTER would be translated as PUSAT kawalan in Malay language.
Have idioms and allusions that make no sense to a non-native speaker. Even languages that are closely related to English have turns of phrase that are completely incomprehensible without a native to explain their use, such as the French avoir les dents longues ("to have long teeth", meaning "to be ambitious") or the German Ich werde dir die Daumen drücken ("I'll squeeze my thumbs for you", meaning "I wish you luck"). Languages of vastly different derivation, evolving in a wildly foreign cultural matrix, can (and do!) have idioms that make even less sense to the outsider -- and nonhuman/alien idioms may be utterly impenetrable even with native help.
Similarly, has a different concept of what constitutes "blasphemous", "obscene" or "offensive" language. Different body parts, functions or gestures -- or none at all -- may be offensive to native speakers; other obscenities will be culturally-based, derived from the religious, social and/or political matrix in which the language evolved. This can be seen even between English-speaking cultures -- it was noted once that Catholics tended toward religious-based oaths, while Protestants swore by bodily functions. And Americans generally have no idea why some Brits consider "bloody" such an offensive adjective that in the Victorian era it was frequently replaced with "ruddy", and its use still gets reprimands in some quarters today. Further, a dialect may encode a language's obscenities into unrecognizability -- see the "Cockney Rhyming Slang" section of the British English page. And some obscenities may well be fossils -- words or usages which carry offense only because "everybody knows they're dirty", despite the reason for this common knowledge being long forgotten. In more extreme cases, entire tenses, moods or categories may be offensive, perhaps under complex rules governing time, place and speaker.
And above all, do not have only and all of the sounds that are found in English. The pronunciation of even closely related languages like French and German can only be approximated by English sounds, let alone more distant languages, and vice versa: this is of course where foreign accents come from. Even a lot of conlangs still use English's horribly complicated tense/lax vowel system (yet many claim to have five vowels, while English generally has 12 or more), and some of the worse-done relexes and such employ English orthographic conventions as well -- writing reed or rede when the speaker says /r\i:d/. And few if any conlangs employ more consonants than English possesses (which do exist -- Xhosa and related African languages, for instance, have three entire groups of click-based consonants which have no counterparts in Indo-European tongues, and the glottal stop -- which while present in English is generally not even noticed as a separate "sound" -- is a common element in many others).

↑ have as in "Have you seen my new boots?" not as in, "I have a new pair of boots
↑ Rather like the Discworld Trolls.
↑ Each symbol stands for a word or a morpheme, as in mean-ing-ful
↑ Vowels are not written
↑ Each symbol represents a syllable
↑ Vowels are written as attachments to consonants

[1] ve as in "Have you seen my new boots?" not as in, "I have a new pair of boots

[2] Rather like the Discworld Trolls.

[3] Each symbol stands for a word or a morpheme, as in mean-ing-ful

[4] Vowels are not written

[5] Each symbol represents a syllable

[6] Vowels are written as attachments to consonants

[1]

[2]

[3]

[4]

[5]

[6]