google geocoding api and UTF-8 output
Sunday, October 7th, 2007So, I was just fooling around with google’s geocoding API and doing tests with different target addresses. And suddenly, my test code barfed about malformed xml.
A quick look at the xml file that was printed on the console didn’t indicate anything obvious. One thing that could indicate the source of the problem was that the “missing” xml tag was preceded by an accented character.
Since when is my console UTF-8 aware? And the XML declaration says this is UTF-8. What would happen, if I put an accented character (not UTF-8 encoded) before a < ? The parser would try to expand the character with the following one, hence eating the < and breaking the tag, and this breaks client code.
Sure, as a quick search revealed, I can add oe=UTF-8 at the end of my query string (or I could convert the output on my side before parsing, but no, thanks!). BUT WHY THE HELL ARE YOU DECLARING UTF-8 IF YOU’RE NOT REALLY SENDING UTF-8?!@# Makes me wonder how their xml generator is written. A VB script using some kind of printf? Toh!
So currently, my dumbest geocode API bugs:
- the server lies when it says it is sending UTF-8
- the server can’t handle POSTs (Is it that hard to do both GET and POST? Don’t answer, please…)