JSON and Chinese characters
Today I’ve deployed my first app for Google App Engine.
I wrote this little application to learn some Google App Engine and to practice my lately rusting python skills. I was inspired by the work of Simon Willison with his json-time and json-head. Studying (well, actually trying to study) Mandarin I found cumbersome to get to know the reading of unknown characters, so this app was made to scratch my own itch and as an excuse to play with Unicode, JSON and Ajax.
While it’s relative easy to find the correct meaning of a phrase using the amazing GTalk Translation Bots while chatting in Chinese, outside webpages (and definitely not in Safari) pinyin reading is out of reach.
Sure, on websites you can easily use Chinese Pera-Kun and Google Translate offers good translation APIs, but pinyin is inaccessible.
I wrote a python script to get the pinyin readings parsing the Unihan database, I ran it locally on my iBook (I basically took all the characters having an existing kMandarin field) and populate with them a GApp Datastore.
The API is dead simple:
call http://json-zh.appspot.com/pinyin?hanzi=大猩猩 if you just want the pinyin reading in JSON format,
or append callback if you need the results in JSONP.
Obviously there is more than one reading for lost of Chinese characters, but to keep things simple, I choose the first one. After all it seems the majority of the characters are not polyphonic. I plan to show the other readings if you ask only one character, but I wanted to release it soon , you know how it goes otherwise…
Feel free to use it and if you found it useful don’t hesitate to drop me a mail.