Sunday 5 February 2012
Saturday 4 February 2012
Originally published at http://indopersica.blogspot.com
Visit http://fon.gs/indopersica for more
“An Introduction To Tamil Script – Reading & Writing” was the book that helped me learn to read Tamil. I first picked it up more than 8 years ago, and made good strides thanks to its rather novel approach, effectively being able to read Tamil fairly well within a month.
This book, authored by two of the most respected names in Indian linguistics – Debi Prasanna Pattanayak and MS Thirumalai – and originally published by the Central Institute of Indian Languages (CIIL) in 1980, uses a method we could call the ‘Shape Similarity Method’. Unlike the traditional method of teaching Indic alphabets – which involves teaching vowel letters followed by consonant letters, and then followed by the special forms of vowels when attached to consonants (vowel signs or mātrā) – this book starts off with letters that have the simplest shape, and slowly moves on to more complicated ones.
Also deviating from tradition, vowel signs / mātrā are taught before the individual vowel letters themselves. Here again, the books begins with the vowel signs that have the simplest shape and then goes on to the more complicated ones (such as signs that come before the consonant).
Renu Gupta in her paper “Initial literacy in Devanagari: What Matters to Learners” mentions that for members of the South Asian diaspora wishing to learn their heritage language, learning the script presents an additional hurdle. This is also true of Indians in India who would want (or need) to learn a script, for instance, the script used in the state or region they currently live in. On a related note, I constantly encounter a number of young people – almost all from urban areas, I should add – who are able to speak their mother tongue to a reasonable extent but are unable to read it well or at all.
In such a context, this shape-based learning methodology seems to be an extremely useful tool to help adult learners learn new scripts, for whom time is limited and similarities in shape can prove confusing if the script is taught the traditional way.
Though Gupta in her paper mentions that “[t]he effectiveness of the shape similarity method is not clear because the results have not been documented”, she goes on to say that “there may be some advantages in using this approach… [f]or literate adults”. I agree wholeheartedly with the latter statement, as I myself have been through this experience and can vouch for its effectiveness.
Unfortunately, it seems that CIIL hasn’t given Google Books permission to publish (even a part of) their wonderful publications on script learning, which is a pity, because the method used in these books is a highly practical and innovative one and has potential to help adults in the subcontinent become multi-scriptal instead of just multilingual – a huge practical benefit.
Here is a picture of the first page of the Tamil script book (I hope this isn’t a violation of copyright; it’s just one page!) –
Thankfully, CIIL has taken at least some of their script-teaching material online. Their Kannada script page, which also uses the Script Similarity Method, is available at http://www.ciil-learnkannada.net/ccck/webpages/contents/script.htm (you might have to create a user account and log in, and also download special fonts – it’s really unfortunate that a body like the CIIL has still not moved to Unicode).
Monday 11 July 2011
Originally published at http://indopersica.blogspot.com
A few days ago, I’d blogged about the romanisation of Brahui and Balochi, which invariably brought up the topic of romanisation of South Asian languages in general. With such romanisation affecting our daily lives – at least in India – in a considerable way, it seemed like a good excuse to delve into the matter of why there seems to be so much uncertainty about how to write names and words of Indian or South Asian origin in the Roman/Latin script.
Note: The term ‘transliteration’ normally refers to a one-to-one replacement of characters of one script with those of another script, while the term ‘romanisation’ can mean (i) transliteration only into the Roman script, or (ii) a sound-based transcription into the Roman script, with no regard to the characters used in the original script. In this article, I have used ‘romanisation’ to mean point (i) above. I have used the word ‘transcription’ to describe any occurrences of point (ii) above.
The origins of romanisation in South Asia are of course traceable to the British Raj, and a system of romanising and/or transcribing local place names for surveying purposes was developed by one William Wilson Hunter. The resulting system therefore came to be known as the Hunterian system of romanisation.
While the Hunterian system was reasonably suitable for romanising Hindi and related Indo-Aryan languages – albeit with some uncertainties – it did not provide the means to unambiguously romanise certain characters belonging to scripts of Dravidian or Tibeto-Burman languages, or even of other Indo-Aryan languages like Bengali.
In 1894, the International Alphabet for Sanskrit Transliteration (IAST) was established, but was geared towards a lossless romanisation of Sanskrit – an extremely phonetic language, where the script matched the pronunciation almost completely.
This system, as its name suggested, was aimed at romanising only Sanskrit, and therefore failed to address the romanisation of characters not occurring in Sanskrit, such as some characters occurring only in the scripts of Sinhalese, Dravidian and Tibeto-Burman languages etc.
Also, since many modern South Asian languages that used Brahmi-derived scripts (also called Indic scripts) no longer had a one-to-one character-sound mapping, or in other words were no longer truly phonetic—due to inevitable evolution—their romanisation according to IAST threw up its fair share of issues, as it did with the Hunterian system.
The National Library at Calcutta romanisation (NLC) issued in 1988 expanded on the IAST to include missing romanisation for certain characters in the scripts of Dravidian and Eastern Indo-Aryan languages. However, it too did not provide romanisations for certain characters in Tibeto-Burman languages (Ladakhi, Dzongkha, Tibetan proper, Lepcha etc.).
In the meantime, there also appeared the UNGEGN (1972, focussing only on romanising place names), ALA-LC (1997 latest) and ISCII (1991) romanisation standards for various Indic scripts. Out of these, ISCII, the Indian Script Code for Information Interchange, was mainly designed as a system for representing Indic scripts on computers, but also addressed the issue of romanisation to quite an extent. It played a major part in the correlation and logical organisation of Brahmic script code points, on which the Unicode blocks for Indic scripts were later based, but it too had certain shortcomings.
More than a century after the IAST was invented, the ISO 15919 standard was introduced in 2001, which essentially tried to gap any holes present in any of the current romanisation systems, and by those standards, was very comprehensive. It even included proposed romanisations for characters derived from Perso-Arabic in Brahmi-based scripts.
For some reason though, the ISO 15919 system still fell short in terms of the aforementioned romanisations for Tibeto-Burman languages. Also, since the Devanagari version of Kashmiri was not codified till later that decade, romanisation for Kashmiri too did not find mention in this system.
It may be argued that most of the above systems addressed only romanisations for Indic scripts, i.e., scripts derived from the Brahmi script, and therefore, are not intended to address the romanisations of Perso-Arabic-derived and Tibetan scripts.
However, the Tibetan script is Brahmi-derived, and therefore, would logically have formed part of any such system (a possible reason I could think of as to why Tibetan has been left uncovered is the presence of alternate romanisation systems for it such as Wylie and THL).
None of the above systems make any provisions for scripts not based on Brahmi or Perso-Arabic, such as Ol Chiki.
Add to all this the fact that in addition to a character-for-character transliteration, there are other phonetic considerations (e.g. unpronounced or multiple-sound Indic characters) to be taken into account when coming up with a romanisation system. In some of these systems, such considerations are incompletely or not addressed.
These doubts essentially mean (i) these romanisation systems are employable only in certain restricted contexts, either academic, or when referring only to a select few languages and scripts, and (ii) there exists no consistent romanisation scheme usable for all South Asian languages in general, irrespective of origin or script.
A comparison of the Hunterian and ISO 15919 systems, along with UN & ALA-LC romanisation schemes as applicable for certain Devanagari-based languages can be found here.
1) Hunterian System :: accepted in Kurrachee, not in Cawnpore
Examining the positive and not-so-positive points of each of the romanisation systems mentioned previously, let’s have a look first at the Hunterian system. It –
a) tried to ensure a character-for-character romanisation (except for diphthongs, aspirate consonants and some other consonants, whose romanisations used digraphs)
आ /aː/ = á
का /kaː/ = ká
c) took practicality into consideration (e.g. represented long vowels at the end of a word without an acute accent/macron, since word-ending vowels in many Indo-Aryan languages are pronounced long irrespective of whether they are written as long or short)
d) drew inspiration from existing English character-to-sound mappings:
श् / ष् /ɕ ~ ʃ/ = sh
च् /t͡ɕ/ = ch
ज् /d͡ʑ/ = j
य् /j/ = y
It also –
e) did not (initially) distinguish between retroflex and non-retroflex consonants:
ट् /ʈ/ and त् /t̪/ = t
ड् /ɖ/ and द् /d̪/ = d
ळ् /ɭ/ (Marathi et al) and ल् /l̪ ~ l/ = l
ड़् /ɽ/ and र् /r/ = r
The Hunterian system seemed to have been supplemented later by underdots for retroflex characters, as can be found in some dictionaries published in British times, although I could not find any info on when exactly this happened and who was the first one to do so.
g) did not distinguish between multiple pronunciations of a particular character in the same language:
Marathi and Nepali च्, representing the sounds /t͡ɕ/ as well as /t͡sʰ/, was romanised ch for both sounds, presumably since the two different sounds are unmarked in their native scripts as well
h) did not distinguish between multiple pronunciations of a particular character in different languages:
ज्ञ = Hindi /ɡjə/ vs. Marathi /dnʲə/ (cf. Eastern Nagari equivalent character জ্ঞ with Bengali pronunciation /ɡɡɔ/)
i) did not provide for clear-cut transliteration of certain sounds occurring in Dravidian languages (and scripts):
எ ఎ ಎ എ – short /e/ (as opposed to long /eː/) ஒ ఒ ಒ ഒ – short /o/ (as opposed to long /oː/) ழ் ഴ് – /ɻ/
In spite of these deficiencies, the Hunterian model, seemingly the first attempt at conjuring a logical romanisation system for Indian names and words, did usher in some sanity among all the orthographical madness. Most importantly, it provided the base for most other romanisation schemes that followed.
2) International Alphabet for Sanskrit Transliteration :: takṣaśilā gets the nod, /koːɻikkoːɽ/ doesn’t
The International Alphabet for Sanskrit Transliteration (IAST) intends to be a lossless romanisation for Sanskrit, and according to Wikipedia, for Pali as well.
This romanisation system, while obviously building on the Hunterian system, makes a few changes –
a) removes redundant or ambiguity-causing characters from digraphs in the Hunterian system:
ङ् /ŋ/ = Hunterian ng, IAST ṅ
च् /t͡ɕ/ = Hunterian ch, IAST c
ञ् /ɲ/ = Hunterian ny, IAST ñ
श् /ɕ ~ ʃ/ = Hunterian sh, IAST ś
b) specifies an underdot for retroflex character romanisations to distinguish them from non-retroflex ones:
ट् /ʈ/ = Hunterian t, IAST ṭ
ड् /ɖ/ = Hunterian d, IAST ḍ
ष् /ʂ/ = Hunterian sh, IAST ṣ
c) provides a unique romanisation for ऋ /r̩/ = ṛ
ं /◌̃m/ = IAST ṃ
ः /ɦ/ = IAST ḥ
ह /ɦ/ = Hunterian & IAST h
ख /kʰ/ = Hunterian & IAST kh
भ /bʱ/ = Hunterian & IAST bh
However, since this system was intended purely for the romanisation of Sanskrit (and the derived Pali), characters not present in Sanskrit are not covered by this system. Hence the romanisation of any non-Sanskrit characters, such as:
– Tamil ழ் and Malayalam ഴ് – pronounced /ɻ/
– characters for the Dravidian short vowels /e/ and /o/
– Tibetan ཚ /t͡sʰ/ and ཞ /ʑ/
– characters in various scripts for the nasaliser (chandrabindu) ँ /◌̃/, and
– new invented characters such as ऍ and ऑ, used to represent English /æ/ and /ɔ/
are out of the scope of this system.
Also, the IAST uses the romanisation ḷ for the character ऌ /l̩/ and its equivalents. This conflicts with the romanisation for ळ /ɭ/ — also ḷ — used in Pali and Vedic Sanskrit. I couldn’t find any info on whether there was an alternate non-conflicting romanisation provided for the latter.
In addition, if ever considered as a daily-life romanisation system for South Asian scripts, some may raise the following issues:
– how would the romanisation of scripts used for Indo-Aryan languages be affected by schwa deletion—a feature that is quite predictable in northern languages like Hindi, Punjabi et al, but not so much in southern ones like Marathi
– whether people can ‘adjust’ to the representation of च् /t͡ɕ/ = c, श् /ɕ ~ ʃ/ = ś and so on, since we are all ‘so used to’ च् /t͡ɕ/ = ch and श् /ɕ ~ ʃ/ = sh, due to English’s influence on our daily lives.
3) National Library at Calcutta System :: Truly national
The NLC romanisation system, issued in 1988, extends the IAST by the following characters:
a) எ ఎ ಎ എ – short /e/ (as opposed to long /eː/) = e
b) ஒ ఒ ಒ ഒ – short /o/ (as opposed to long /oː/) = o
c) ळ् ળ્ (ਲ਼੍) ଳ୍ ள் ళ్ ಳ್ ള് – /ɭ/ – ḷ
d) ற் ఱ్ ಱ್ റ് – alveolar /r, ɾ/ – ṟ
e) ன் – alveolar /n/ (as opposed to Tamil dental /n̪/) – ṉ
f) ழ் ഴ് – /ɻ/ – ḻ
The following modifications were made to existing characters in the IAST:
g) ए and its equivalents – /eː/ = ē h) ओ and its equivalents – /oː/ = ō
that is, e and o without macrons represented their short versions only.
I have not been able to find clear-cut specifications on the romanisation of the following characters according to the NLC system:
i) ड़् /ɽ/ and ढ़् /ɽʱ/
An article on French Wikipedia says that the NLC romanisations for these characters are d̂ and d̂h. However, it also provides ṛ and ṛh as possible NLC romanisations, which conflicts with ऋ = ṛ. Specific sources for this info are not provided in the article.
I did find this link to the ISCII romanisation scheme, though. There is a mention of d̂ and d̂h in it, but no mention of whether it forms a part of the NLC system or not.
In addition, I remember seeing a reference to ऋ = ṛ and ड़् = ṙ a long time (almost 10 years) ago, but was unable to find it again on the internet.
j) Chandrabindu ँ /◌̃/ = ◌̃, m̐, n̐?
k) No mention of transliteration of scripts of Tibeto-Burman languages. Sinhalese also does not find a mention, although I would imagine that it was excluded as the NLC system was designed with ‘Indian’ languages in mind.
NLC chart at IIT Madras ‘Acharya’ project
4) ISCII :: Information and script interchange
The Indian Script Code for Information Interchange (ISCII, also known as IS 13194) released in 1991 was mainly a coding scheme for computers and related devices, which had a system of ‘code points’ onto which equivalent letters from various Indic scripts were mapped. Thus, the following characters – क ক ਕ ક க క ಕ ക – were all mapped onto the same code point, as they were ‘equivalent’ characters, all representing the sound /k/.
By changing the script specification, say from Devanagari to Gurmukhi, the Devanagari character would be ‘transliterated’ into the equivalent Gurmukhi character. What actually would happen is that the code point would remain the same; only the ‘rendering’ would change as per the script specified.
In other words, the code points were the deep layer, and the script itself was the surface layer. Changing the surface layer would provide a ‘transliteration’ of a particular character or string of characters.
To make up for the lack of characters in Devanagari that would be equivalent to certain Dravidian-script characters, such as Tamil எ /e/, ஒ /o/, ன் /n/, ற் /r, ɾ/ and ழ் /ɻ/, ISCII introduced ‘invented’ Devanagari equivalents for these characters, namely ऎ, ऒ, ऩ्, ऱ् and ऴ् respectively.
According to the Wikipedia ISCII article (retrieved 2011-07-09):
“One motivation for the use of a single encoding is the idea that it will allow easy transliteration from one writing system to another. However, there are enough incompatibilities [to prove] that this is not really a practical idea.”
According to the same article:
“ISCII has not been widely used outside of certain government institutions and has now been rendered largely obsolete by Unicode.”
The article also states that Unicode “largely preserves the ISCII layout within each block”, seemingly a useful legacy of ISCII.
Speaking of transliteration, ISCII did provide a romanisation scheme as well (see ‘Other links’ below), making use of diacritical characters and based on the NLC system. Some of the main points were –
a) included romanisations for some obscure, Sanskrit-only characters such as ऌ /l̩/ and its equivalents = ḻ
b) transliterated Tamil ழ் and Malayalam ഴ് – /ɻ/ = ẕ
This obviously contradicts the NLC system, which romanises ழ் and ഴ് as ḻ
Instead, ISCII uses ḻ as a romanisation for ऌ /l̩/ (see point a) above). I wasn’t able to find enough references to throw light on this conflict.
c) transliterated ड़् /ɽ/ and ढ़् /ɽʱ/ (and its equivalents) as d̂ and d̂h respectively.
As mentioned in the previous section on the NLC system, I wasn’t able to find sufficient resources to verify whether this was a specification of the NLC system or an invention of the ISCII.
d) provided transliterations for certain Brahmic characters used to represent certain Perso-Arabic sounds such as /z/, /f/, /x/ and /ɣ/
e) did not include transliterations/romanisations for Brahmic scripts used for Tibeto-Burman languages
5) ISO 15919 :: From Pondicherry to Gangtok
The ISO 15919 standard, issued in 2001, is by far the most comprehensive romanisation standard for Brahmic scripts that has been drawn up to date. The ISO 15919 provides extensive information on not just romanisation, but cross-transliteration from one Indic script into another.
It also covers the transliteration of certain Perso-Arabic characters into their equivalent Brahmic ones, and in doing so, makes a mention of their recommended romanisations as well, albeit with some restrictions.
The ISO 15919 is extremely detailed, and a site dedicated to explaining it can be found here.
A few salient points about the system are:
a) builds on the ISCII romanisation system
b) Clarifies some conflicts in the IAST:
ड़् /ɽ/ and its equivalents = ṛ
ऋ /r̩/ and its equivalents = r̥
ळ् /ɭ/ and its equivalents = ḷ
ऌ /l̩/ and its equivalents = l̥
c) Changes ISCII Tamil ழ் and Malayalam ഴ് – /ɻ/ from ẕ to ḻ
d) describes precisely how chandrabindu ँ /◌̃/ and its equivalents are to be romanised (including Gurmukhi bindi and tippi)
e) provides a romanisation for the Sinhalese script
f) deals with the romanisation of rarely used characters such as avagraha
g) provides (rather strangely, in my opinion) guidelines for romanisation of Indic characters transliterated from Perso-Arabic-based scripts; not romanisation of the Perso-Arabic characters themselves.
A few things that seemed confusing to me are:
h) Sinhalese script ඇ /æ/ and ඈ /æː/ are romanised æ and ǣ respectively, where æ is a ligature of a and e, and ǣ is the same character with a macron above. However, Devanagari ऍ /æ ~ æː/ (also written अॅ) is romanised ê. If these characters have the same sound, then maybe they could have been romanised the same way?
i) On the same lines, Bengali script অ্যা /æ ~ æː/ is romanised as a:yā, and not æ or ǣ
It’s possible that the logic behind this was to consider the characters/ligatures as historically different, and therefore to provide differing romanisations for them, irrespective of the fact that they have the same sound. After all, there are characters in different Brahmic scripts that have the same sound in modern times, but are romanised differently as their historical origins are different, such as Devanagari Hindi ज् and Eastern Nagari Bengali য্ (both pronounced /ʥ/).
This, however, means that ISO 15919, due to its emphasis on clarity and retraceability, loses out sometimes on aesthetics. For example, romanising অ্যাক্সিস ব্যাঙ্ক ‘Axis Bank’ as a:yāksisa byāṅka somehow seems ‘readable’ only in the Bengali script and not in the romanised form.
Of course, this is an extreme case where we’re considering the romanisation of words in the Bengali script which themselves are transcriptions of English words (axis & bank).
And for reasons of clarity and retraceability, there is of course is no provision for schwa deletion in ISO 15919, which again means that the usage of ISO 15919 in its current form as a ‘daily-life’ romanisation seems unfeasible, due to aesthetic considerations (readability) and the general prevailing trend of pronunciation-based loose romanisation.
Recall the renaming—if you can call it that—of Pondicherry into ‘Puducherry’ – a hybrid, ad-hoc romanisation seemingly trying to incorporate phoneticity as well as readability. According to ISO 15919, it would have been spelt putuccēri, after Tamil புதுச்சேரி /pud̪ɯʨʨeːri/.
While we debate Puducherry versus putuccēri, the verdict is still out on སྒང་ཐོག་ /ɡàŋtʰók/, since ISO 15919 too does not include the Tibetan script in its scope. For now, we’ll stick with romanising སྒང་ཐོག་ as ‘Gangtok’.
Existing Roman-script writing systems for South Asian languages
A few South Asian languages—such as Mizo, Konkani and Divehi—are already written, officially or unofficially, in the Roman script. Mizo is probably the only language having some level of recognition in India (official language of Mizoram) whose only script is the Roman script. Konkani is officially written in Devanagari, as decreed by the Government of Goa, but the Roman script is widely used for it and campaigns are ongoing for it to be recognised as an official script of Konkani (see my earlier blog post of Goan Place Names).
As Mizo was previously unwritten, its Roman script is ‘original’ and therefore cannot be called a transliteration. As regards Konkani, its Roman script system tends towards being a transcription, with there not really being a 100% one-to-one correspondence between it and Devanagari Konkani, and therefore cannot be called a transliteration either.
Divehi—officially written in the Tana or Thaana script—has an official Roman transliteration system, with a one-to-one correspondence between particular Tana and Roman letters. Its aesthetics may be debated, but due to the fact that it uses only the standard 26 letters of the Roman alphabet and the apostrophe as the only diacritical mark, it is very easily reproducible, which incidentally was the intention behind its invention – for it to be used on Telex machines in the 1970s, which did not support the Tana script.
However, all these Roman script versions use varying sound-letter mappings for various languages, à la European languages written in the Roman script, and therefore have to be learnt individually.
No comprehensive and clear-cut romanisation system—either transliteration or transcription—for South Asian languages and scripts that is used as an academic as well as a daily-life standard—on the lines of Hanyu Pinyin for Mandarin or RR for Korean—seems to have yet emerged. It would, in my opinion, be highly useful to have such a standard for obvious purposes of convenience and clarity in information exchange, and also for (potentially) furthering literacy.
However, it’s inevitable that the makers of such a pan-South Asian standard will have an uphill task for the following reasons (among others) –
a) preservation of orthographic as well as phonetic fidelity in the romanisation is often conflictive, i.e., preservation of one often means loss of the other.
b) as an extension of the previous point, there is the tricky problem of how to consistently deal with equivalent (strings of) characters that are pronounced differently in different languages.
Devanagari (as applicable to Hindi) अरविन्द /ərʋɪn̪d̪/
Devanagari (as applicable to Marathi) अरविंद /ɤ̞rᵊwin̪d̪ᵊ/
Eastern Nagari (as applicable to Bengali) অরবিন্দ /ɔrobin̪d̪o/
Eastern Nagari (as applicable to Assamese) অৰবিন্দ /ɔrɔbindɔ/
c) choice of particular characters or signs might aid legibility for one script or language, and hinder it for another.
e.g. the choice of ē and ō for long /eː/ and long /oː/ respectively is a logical choice for Dravidian scripts and languages, to differentiate them from short e /e/ and short o /o/, but is mostly redundant for scripts for Indo-Aryan languages, which usually do not have short /e/ and /o/.
Maintaining the macron above these letters for scripts of Indo-Aryan languages will result in needless orthographic clutter, while eliminating the macron will lead to orthographic inconsistency with romanisations for other scripts, such as those for Dravidian languages.
d) The very choice of which scripts (and languages are to be covered).
A Sanskrit quote which according to me sums up the situation very succinctly (romanisation in IAST, ‘standard’ pronunciation in IPA):
अमन्त्रमक्षरं नास्ति नास्ति मूलमनौषधम् ।
अयोग्यः पुरुषो नास्ति योजकस्तत्र दुर्लभः॥
amantramakṣaraṁ nāsti nāsti mūlamanauṣadhaṁ
ayogyaḥ puruṣo nāsti yojakastatra durlabhaḥ
/əman̪t̪rəməkʂərə̃m naːs̪t̪i naːs̪t̪i muːləmənəwʂəd̪ʱə̃m
ajoːɡjəɦə puruʂoː naːs̪t̪i joːʥəkəs̪t̪ət̪rə d̪urləbʱəɦə/
“There is no syllable not a mantra, no plant not medicinal,
there is no person unworthy; what is lacking is an ‘enabler’”
P.S.: The words “khayaal aapka” in the title of this post are the tagline of the current ICICI Bank ad campaigns. Their ‘correct’ pronunciation is /xjaːl aːpkaˑ/, and roughly mean “thinking of you” or “caring for you” in Hindi/Urdu. The words have been romanised in an ad-hoc manner from Devanagari Hindi ख़्याल आपका and Urdu خیال آپ کا.
This article is by no means supposed to be a comprehensive or scholarly work on the topic of South Asian transliteration and romanisation. There may be errors, and also many related areas and topics which have been left uncovered, either unintentionally or intentionally. This article has been written purely out of a personal interest in the topic, and as such I welcome any corrections/additions/criticisms regarding it.
Friday 1 July 2011
While surfing Wikipedia, I came across the article on the Brahui language. This article stated that the Brahui Language Board (BLB) has approved a new Roman orthography for the language.
At first glance, the orthography seems to suit the sound system of the language quite well. And presumably so, because it’s a constructed one, unlike English orthography, which has turned out the way it has due to it having had too many cooks over the years.
However, what is particularly striking is the use of the accented or diacritical characters in the orthography. All such characters are either from the Latin-1 Supplement or Latin Extended-A subranges of Unicode.
This seems a rather pragmatic choice, as the characters in these subranges are used by a number of European languages and therefore, are present in many fonts available.
However, this also means that the orthography varies markedly from the general systems of romanisation used for South Asian languages (Hunterian, IAST, National Library at Calcutta romanisation (NLC), ISO 15919) in its use of diacritical characters.
Typically, these romanisation schemes feature a number of characters either from the Latin Extended Additional subrange, or that are not encoded separately in Unicode at all and need to be entered as a base letter + diacritic combination (see this link on ‘precomposed’ and ‘decomposed’ characters in Unicode).
The table below shows some of the variations present in the Brahui orthography as compared to one of the ‘standard’ transliterations for a particular letter/sound –
|Roman Brahui||NLC, ISO 15919 ||IPA|
| ð ||ḍ ||ɖ|
|ŧ ||ṭ ||ʈ|
| ļ || ḷ ||ɭ|
| ŕ ||ṛ||ɽ|
|ş ||ś||ɕ ~ ʃ|
If the letters in the first column above show up properly on your computer/device, and the ones in the second column don’t, then this probably vindicates the BLB’s choice of choosing letters that would show up correctly on as many already existing devices as possible.
Here’s where the spanner gets thrown into the works –
Apparently a system for romanising the Balochi language – a language spoken in the same region as Brahui, and with a very similar sound system – has also been decided upon (see this link). Curiously, all the diacritical letters chosen for Balochi romanisation are also from the Unicode subgroups used for Brahui, but different from the letters used for Brahui (and of course from any existing Indic romanisation system).
This scenario throws up two questions –
– Considering that Brahui and Balochi are spoken in the same region (Balochistan, Pakistan), have a large number of speakers bilingual in both languages and most importantly, share a very similar phonology, why couldn’t there have been more cooperation in choosing Roman orthographies for these languages? The result would most likely have been a single romanisation system suitable for both languages.
– What is the use of the various existing South Asian language romanisation systems, if they are being bypassed for individually tailored romanisations?
Brahui and Balochi aren’t alone in having faced romanisation woes. The various Turkic languages of Central Asia have had a similar story, and for a much longer time (see this Wikipedia article on how their orthographies have been tinkered with over the years).
However, most of these languages (Turkish, Azeri, Tatar) seem to have settled on more-or-less similar Roman orthographies, with the rebels being Uzbek and Turkmen.
Brahui Roman Orthography
Brahui Language Board
Saturday 13 February 2010
Originally published at http://indopersica.blogspot.com
”/kʷʌnit̪əri aˑsəl/”, said the office boy to whoever he was speaking to on the phone – Marathi for “Someone will be there”.
Strangely, I’d heard the same two words being spoken by a visitor to our office a couple of days earlier. The only difference was, the visitor’s words sounded more like koˑɳiˑt̪əriˑ əseˑl.
The reason these two people pronounced the Marathi words ‘कोणीतरी असेल’ differently could be traced back to their social backgrounds. It so happened that the office boy belonged to the (historically) ‘lower caste’, and the visitor to the (historically) ‘higher caste’.
Marathi is pronounced in a variety of ways, depending on the speaker’s home region and social status. I’m somewhat familiar only with Puneri Marathi, as it’s not my mother tongue. But even within Puneri Marathi, there’s a noticeable difference between the pronunciation and vocabulary of speakers of different social strata.
Look at the following differences in the pronunciation of the same words between speakers of different social strata –
|Standard Written Marathi||IPA - |
|IPA - |
While the speech of the ‘higher caste’ is better reflected by standard Marathi spelling, we can also establish a few patterns between the speech of the ‘higher’ and ‘lower’ castes –
eˑ –> jə
oˑ –> wʌ
æˑ –> jaˑ
ɔˑ –> waˑ
f –> pʰ
ə at the beginning of a word –> aˑ
p at the end of a word –> f
An interesting one is the pattern
CVCʱV –> CʱVCV
and its derivations
CVɦV –> CʱVV
VɦV –> ɦVV
Consider the pronunciation of बाहेर ‘outside’.
If C = consonant, and V = vowel, then
baˑɦeˑr (CVɦV) –> bʱaˑeˑr (CʱVV)
Here’s another example. Consider the word पुढे ‘ahead, forward’ –
puɽʱeˑ (CVCʱV) –> pʰuɽeˑ (CʱVCV)
And another one. Here’s the word आहे ‘is’ –
aˑɦeˑ (VɦV) –> ɦaˑeˑ (ɦVV)
Vocabulary, too, seems to vary depending on speaker background, as can be seen here –
Commonly used ‘higher caste’ word
Commonly used ‘lower caste’ word
to warm (food)
Note to readers: It is not my implicit or explicit intention to categorise any social or demographic group as of a ‘high’ or ‘low’ status. These words (and their derivations) have deliberately been kept within quotes in the above text, where they refer only to a historical classification of social groups in Marathi-speaking areas and the rest of India. Follow the Wikipedia links to learn more about the Indian caste system.