Mathematics, psychology and sociology, philosophy.

Saturday, January 22, 2011

I Think I'm Learning Japanese

(I Only Think So)

2011 Jan 21st

"Munafo" = "Intellectual Prince"?

(from usokomaker.com/yoji, a novelty yojijukugo generator)

For various reasons, Japanese language and culture have interested me throughout my life. (A few of the reasons, like トトロ, are commonly known and will be familiar to readers. Others, perhaps less so (possibly NSFW themes and language).)

For many years I only knew a few basic facts: there are three alphabets, all derived originally from Chinese, plus the Hindu-Arabic numerals and our own Latin alphabet, and you pretty much have to learn all of them to get along in daily life. The pronunciation of two of the alphabets is simple and logical, but the third (kanji, the Chinese characters) includes thousands of special cases with little structure or pattern.

The depth and complexity of Japanese, for which it is justly famous, have kept me from doing much more -- until just recently, when I have decided to adopt a religion [1] and consequently travel to Japan to visit the head temple.

When I travel to a place that speaks a different language, I want to be able to read and write certain basic things: numbers, times and dates, the address(es) where I will be, etc. Apart from being prepared (the State Department emphasizes the importance of such things), it is fun, and seems to me an act of basic courtesy to try to learn at least a little bit of the local culture.

My ordinary way of learning involves a lot of computer and Internet use. Methods of entering Japanese katakana and hiragana are relatively direct and obvious. You can type totoro to get トトロ (in katakana, as this is a nontraditional proper name), or fujisann to get ふJiさん or 富士山 (the native pronunciation and spelling, respectively, of the famous mountain overlooking the temple I will be visiting [2]

entering katakana entering katakana     entering hiragana entering hiragana

However, typing in katakana and hiragana is only good enough for looking up proper names. The majority of Japanese writing uses the kanji extensively. Most kanji have at least two pronunciations and multiple meanings. In addition, the partial redundancy [3] of hiragana and kanji means that there are usually two or more ways to write any given word or phrase.

When learning any language there are 4 things to learn (listening, speaking, reading and writing). Whereas in most languages this corresponds to 4 skills, which (for the adult learning a second language) can initially be approached through transliteration and translation respectively, the Japanese situation with three alphabets, two or more pronunciations and two or more spellings makes it more like 10 or 12 skills.

The learning curve is a seven-dimensional manifold.

These 10 or 12 skills are inter-related and interdependent. You don't know which way something will be spelled or pronounced, so it is important to learn both (all) of the alternatives.

Let's Just Learn the Writing

At this point I thought -- Perhaps speech and listening/understanding can be put aside for the moment -- what if I put the spoken skills aside and focus just on reading and writing?. I should still be able to look up Japanese kanji in a dictionary or on Google to find a definition. But that presents another problem, which may strike computer-savvy readers as uniquely odd or even impossible:

In order to be able to type in a kanji character, one must know either how to say it or how to write it.

And by "write", I mean nothing less than the ancient traditional art of brush-and-ink calligraphy.

As it turns out, thousands of years of experience have led to a greatly standardized stroke order for each of the thousands of commonly-occuring characters, and the system is so useful that it is identical in all of the cultures that use the characters (primarily those who speak some form of Chinese, Japanese or Korean). The stroke order gives rise to a convenient and efficient software optimization for handwriting recognition, which can be adopted and used by all native speakers because no-one writes any of the kanji any other way.

So in order to so the most basic and modern task (say, looking up "富士山" on Google Images) I need to learn one of the most ancient and decidedly non-modern tasks (how to paint "富士山" with a brush!).

Entering a Kanji without knowing its pronunciation

(showing a partly-entered "億" (おく, "100,000,000")

This need to know stroke-order in order to look things up in Google is both a curse and a blessing. Simply being able to produce an accurate drawing of the character is not good enough. You have to draw each line in the correct order. This can be very frustrating for beginners, but that is far outweighed by the benefit: one can practice and learn Kanji writing just by trial-and-error in the computer interface.

An Epiphany

It was sometime in the afternoon on January 10th, stumbling through a few of the dozens of equally unlikely ways of writing "蓮" (れん, "lotus"), that I had the stunning insight: Each part of each kanji has its own specific writing order, and is always drawn the same way each time it appears. In this particular case, for example, I can start by learning how to write the "車" (くるま, "car") and the "辶" (チャク, "walk" [4] simplified as radical 162), and the first of these uses "日" (にち, "sun" or "day"). These smaller building blocks each have far fewer possibilities to try, and once I learn them I not only have a fighting chance at drawing "蓮", but am also much more prepared to use any other character that uses any of the same parts (such as "億", shown above, which also uses "日").

This is of course only the second or third thing anyone is taught if they study Chinese writing the "proper" way (like, say, from a book or a teacher). In fact, a friend told me about this in 1982, when I first got curious about Chinese writing. But it kind of slipped my mind somewhere along the line, and it was really cool to figure it out (again) on my own. These kanji building blocks (many of which are "radicals", but many are not) are like little graphical subroutines -- very appealing to my computer programmer aspect.

Footnotes

[1] adopt a religion : I do not proselytize, but if you are curious it is Nichiren_Shōshū. The reasons it appeals to me are the relative peacefulness (and lack of political dictation) of Buddhism in general combined with the prominent role of large numbers [5] in the most important source text, the 16th chapter of the Lotus Sutra. (There is a widely available translation by Burton Watson).

[2] fujisan : Note that fujiyama (ふJiやま) is a common Western mistake: 山 is usually やま but not in this case.)

[3] redundancy : The written alphabets, including the kanji, came to Japan after there was already a distinct Japanese spoken language. The kanji were used wherever a Chinese character (or combination) was directly suited to represent a word. Often, but not always, the Chinese pronunciation was used for the Chinese character. Anything for which there was no word in China, including Japanese conjugations and declensions, etc. had to be represented with extra letters representing their sound (phonetic value). Since the kanji have a pronunciation (and usually two: Original Chinese and Japanese), they too have a phonetic value, and one could just use just a phonetic alphabet. Often you see both: little kana written above or next to the kanji. There are many situations in which that is preferred (writing by or for young or uneducated readers; texts that would otherwise use rare or obscure kanji, instant messaging, etc.) but the reader cannot count on it.

[4] : This is a derivative of "辵", which my Kanji dictinary does not know about (probably because it is not taught in schools). I like it a lot better than the modern alternative, which seems to be "歩" (taught in grade 2) because it puts the "steps" (彳, "walk" in an idiosyncratic form 彡) before the "stopping" (止)

This character reveals some of the flaws in modern integration of computer technology in our culture. Your computer might show "辶" with three strokes or four:

Compare the page title (top) with the article title.

Wikipedia's article on Hyōgaiji notes:

A related weakness (though less relevant to modern language use) is the inability of most commercially-available Japanese fonts to show the traditional forms of many Jōyō kanji, particularly those whose component radicals have been comprehensively altered (such as [...], and 辵 in 運 or 連, rather than [the traditional form used in 迴]). This is mostly an issue in the verbatim reproduction of old texts, and for academic purposes.

These old and/or rare characters (hyōgaiji) are of great interest to me, as the primary application of my Chinese and Japanese learning will be research on large numbers [5].

[5] large numbers : The Chinese Buddhist quantity "阿僧祇" (Sanskrit asaṃkhyeya, Japanese asōgi) means "incalculable" or "innumerable". As a number it can mean anything from 1056 (in common modern usage, see Wikipedia "Chinese numerals") to 10140 (see asaṃkhyeya) to 107×2103 = 1070988433612780846483815379501056 (as seen in the Chinese Wikipedia article on Chinese numerals, item 103 of the long list in the "大數系統" section).

But there are far larger numbers, on the order of 10↑↑(105×2120) where ↑↑ represents the hyper4 operator or iterated exponential function. That is a "power-tower" of 10's a googolplex of 10's high. See novoloka's article on Avatamsaka numbers and go down to the note at the bottom. Note the description that begins with "The first four verses of this poem are most challenging. They apply a superexponential iteration over an exponential one." For more detail see their article measuring the asamkhyeya.