Charset & Encoding
Charset
Charset is the set of characters. For example, GB2312(charset) has contained basic simplified chinese characters; Unicode(charset) has contained 144697 characters of 22 languages.
Encoding
Encodings map the character and the binary code representation. It depends the way these characters are stored into memory. When u want to transport some bytes, it should be encode and decode in the same encodings, otherwise u will get wrong results.
1 | >>> text = u'您好' |
Relationship
Every charset maybe contain one or more encodings, like Unicode. Unicode charset has utf-8, utf-16, ucs-2 encodings.