Dive into python 3 chapter 4

first some knowledge (kinda boring stuff) about Unicode. UTF-8 is a variable-length encoding system for Unicode. That is, different characters take up a different number of bytes. It’s an efficient way of encoding comparing to UTF-32 or UTF-16, and it doesn’t cause byte-ordering issues when the byte stream is transferred.

note that all strings are sequences of Unicode characters, and UTF-8 is a way of encoding characters as a sequences of bytes.

Formatting can be a powerful tool for you can pass on list and access its item, pass on dictionaries and access the values with the keys, or even pass on modules and access its variables and functions

>>> import humansize
>>> import sys
>>> '1MB = 1000{0.modules[humansize].SUFFIXES[1000][0]}'.format(sys)
'1MB = 1000KB'

also just like c, we can use {index:.3f} to specify decimal precision and adjust space padding. Advanced useage like ` ‘‘.format(‘right aligned’)` can be consulted here.


speaking of useful methods, we have count(), str.split(sep=None, maxsplit=-1)(which returns a list), look up the Official Doc is always welcomed.

bytes are somehow similar to strings, with a look of b'', and its items are integer 0-256. It’s immutable (unable to change), but you workaround ways are using bytearray object, you may convert string to bytes using encode(), and vice versa with(decode()).