UTF Encodings

Training

UTF Encodings

Speaker: Csaba Nemeti

14 July 2020

One of the common problems (regardless of the used programming language and / or framework and regardless of targeted application category) in software development is character encoding. Even today it is possible to go on a web site and suddenly notice some weird characters in the text like ?, #, ¤, ¶, ¼ while you know for sure those should be letters. Through a series of workshops with all development teams (covering all programming languages used in the company) we analysed what problems could appear due to character encoding and talked about solutions that were applied in different situations.
The following items were discussed:
• What is a character and a character set?
• What is a code point?
• Character set examples (historical and modern)
• What character encoding means?
• General issues with character encoding (historical and modern)
• Modern character encoding algorithm: UTF-8, UTF-16, UTF-32
• Issues with character encoding in database
• Issues with character encoding in front end (HTML, JavaScript and JavaScript frameworks)
• Issues with character encoding in Java, .NET, PHP and Ruby

Thank you to all for your involvement!