How UTF-8 Encoding Works
This post explains exactly how UTF-8 encodes Unicode code points into bytes. If you haven’t read the history of how we got here, see The History of Text Encoding. The Design Goals UTF-8 was designed by Ken Thompson and Rob Pike with specific goals in mind: ASCII compatibility: Bytes 0x00-0x7F mean exactly what they mean in ASCII Self-synchronization: You can identify character boundaries from any position No NUL bytes: Except for the actual NUL character (U+0000), no byte is ever 0x00 Sortable: Byte-wise sorting of UTF-8 strings sorts by code point order The Encoding Scheme UTF-8 uses a variable number of bytes (1-4) depending on the code point: ...