Argh... Unicode in HTML problems

by iwz · May 9, 2005 · 94 views · ·

We are trying to make our webapp more international, and currently support English, Spanish, Portugeuse, German, French, and a bunch of other languages. ISO8859-1 seems to be a good enough code set for these languages.

We will soon have to support Greek. o_O It's a completely different code set, ISO8859-7.

So, my question is, should we change code sets based on user locale, or just go with UTF-8? We're running all Java, so all Strings are internally Unicode.

If I try to switch to UTF-8, it seems like lots of characters change from being renderable into a square character, or a question mark. Is this just because my machine can't display those characters? Do I have to convert all Strings into HTML entities like so? ` Or can I just display them normally like so? –

So confusing...

To contribute to the discussion, please log in.

10 Comments

yayOG 2004 · May 10, 2005

hey i read this book, it had a section on unicode, not just technical details but over view, it might help you answer some questions you can conclude yourself
http://www.joelonsoftware.com/articles/Unicode.html

so far ive never had to deal with it, the funny thing is, i actually still deal with IBM EBCDIC here, the less famous father of ASCII