How UTF-16 Encoding Works
This post explains exactly how UTF-16 encodes Unicode code points into bytes. If you haven’t read the history of how we got here, see The History of Text Encoding. From UCS-2 to UTF-16 Originally, Unicode was designed to fit in 16 bits - the Basic Multilingual Plane (BMP), covering code points U+0000 to U+FFFF. The encoding UCS-2 simply stored each code point as a 16-bit integer. When Unicode expanded beyond 65,536 characters (adding emoji, historical scripts, rare CJK characters, etc.), UCS-2 couldn’t represent the new code points. UTF-16 was created as a backward-compatible extension using surrogate pairs. ...