Base64 原理

在计算机领域，Base64 是一类将 二进制数据 转为 可打印 字符序列的 二进制转文本 编码方案，字符集包含 64 个唯一符号。原始数据按 6 位为一组处理，每组映射为 64 个字符之一。

与其他二进制转文本编码类似，Base64 便于在仅可靠支持文本的通道中传输二进制数据。它在 Web 端尤为常见，可将图片等二进制资源嵌入 HTML/CSS 等文本资源中。

由于早期 SMTP 仅支持 7 位 ASCII，Base64 也被广泛用于邮件附件的发送：发送前将附件编码为 Base64，接收后再解码，可避免老旧服务器对附件的干扰。

相较原始二进制数据，Base64 编码会带来约 33%–37% 的体积开销（编码本身约 33%，可选换行最多再增加约 4%）。

设计

用于表示 64 个数值的字符集在不同实现中可能不同。一般选择各编码中常见且可打印的字符，避免在历史上非 8 位干净的系统（如电子邮件）中被篡改。例如，MIME 的 Base64 前 62 个值使用 A–Z、a–z、0–9；其他变体也类似，但最后两个符号可能不同（如 UTF‑7）。

该类编码最早用于同一操作系统间的拨号通信，例如 UNIX 的 uuencode、TRS‑80 的 BinHex（后被移植到 Macintosh）。这些方案会假设哪些字符是“安全”的，例如 uuencode 使用大写字母、数字和多种标点，而不使用小写字母。

RFC 4648 中的 Base64 表

索引	二进制	字符	Index	Binary	Char.	Index	Binary	Char.	Index	Binary	Char.
0	000000	A	16	010000	Q	32	100000	g	48	110000	w
1	000001	B	17	010001	R	33	100001	h	49	110001	x
2	000010	C	18	010010	S	34	100010	i	50	110010	y
3	000011	D	19	010011	T	35	100011	j	51	110011	z
4	000100	E	20	010100	U	36	100100	k	52	110100	0
5	000101	F	21	010101	V	37	100101	l	53	110101	1
6	000110	G	22	010110	W	38	100110	m	54	110110	2
7	000111	H	23	010111	X	39	100111	n	55	110111	3
8	001000	I	24	011000	Y	40	101000	o	56	111000	4
9	001001	J	25	011001	Z	41	101001	p	57	111001	5
10	001010	K	26	011010	a	42	101010	q	58	111010	6
11	001011	L	27	011011	b	43	101011	r	59	111011	7
12	001100	M	28	011100	c	44	101100	s	60	111100	8
13	001101	N	29	011101	d	45	101101	t	61	111101	9
14	001110	O	30	011110	e	46	101110	u	62	111110	+
15	001111	P	31	011111	f	47	101111	v	63	111111	/
Padding											=

示例

In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the byte values 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values. As this example illustrates, Base64 encoding converts three octets into four encoded characters.

示例：源字符串（Man）的 Base64 编码

源 ASCII 文本	M	a	n
字符	M	a	n
Octets	77 (0x4d)	97 (0x61)	110 (0x6e)
Bits	0 1 0 0 1 1 0 1	0 1 1 0 0 0 0 1	0 1 1 0 1 1 1 0
Base64 encoded Sextets	19	22	5 46
Base64 encoded Character	T	W	F u
Base64 encoded Octets	84 (0x54)	87 (0x57)	70 (0x46) 117 (0x75)

注意：末块可能添加 = 作为填充，以保证最后一组包含 4 个 Base64 字符。

Hexadecimal to octal transformation is useful to convert between binary and Base64. For example, the hexadecimal representation of the 24 bits above is 4D616E, whose octal is 23260556. Split into pairs (23 26 05 56) and map each to decimal (19 22 05 46); using those four decimal numbers as indices for the Base64 alphabet yields the ASCII characters TWFu.

If there are only two significant input octets (e.g., "Ma"), or when the last input group contains only two octets, all 16 bits are captured in the first three Base64 digits (18 bits); the two least significant bits of the last 6‑bit block will be zero and are discarded on decoding (along with the succeeding = padding character).

Source ASCII text	M	a
Character	M	a
Octets	77 (0x4d)	97 (0x61)
Bits	0 1 0 0 1 1 0 1	0 1 1 0 0 0 0 1	0 0
Base64 encoded Sextets	19	22	4
Base64 encoded Character	T	W	E
Base64 encoded Octets	84 (0x54)	87 (0x57)	69 (0x45)

输出填充

Because Base64 is a six‑bit encoding, and because the decoded values are divided into 8‑bit octets, every four characters of Base64‑encoded text (4 sextets = 4 × 6 = 24 bits) represent three octets of unencoded text or data (3 octets = 3 × 8 = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is =, which indicates that no further bits are needed to fully encode the input.

The example below illustrates how truncating the input of the quote changes the output padding:

输入		输出		填充
文本	长度	文本	长度
light work.	11	`bGlnaHQgd29yay4=`	16	1
light work	10	`bGlnaHQgd29yaw==`	16	2
light wor	9	`bGlnaHQgd29y`	12	0
light wo	8	`bGlnaHQgd28=`	12	1
light w	7	`bGlnaHQgdw==`	12	2

解码时并非必须依赖填充字符，因为可由编码文本长度推断缺失字节。一些实现要求填充，另一些则不需要。常见的需要填充的场景是多个 Base64 文件被拼接。

带填充的解码

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single = indicates that the four characters will decode to only two bytes, while == indicates that the four characters will decode to only a single byte. For example:

编码	填充	长度	解码
`bGlnaHQgdw==`	==	1	light w
`bGlnaHQgd28=`	=	2	light wo
`bGlnaHQgd29y`	None	3	light wor

另一种理解填充字符的方式是：每遇到一个 =，就从位串末尾丢弃若干位。例如解码 bGlnaHQgdw== 时，将每个字符（除末尾 =）转为 6 位，再对两个 = 各丢弃 2 位，共 4 位；余下 8 位即 1 个字节。

不带填充的解码

无填充时，按 4 个字符 → 3 个字节的节奏反复解码，末尾可能不足 4 个字符，只会剩余 2 或 3 个字符；不会只剩 1 个字符，因为单个 Base64 字符只有 6 位，构成 1 字节需要至少 2 个字符：第一个提供 6 位，第二个提供其前 2 位。例如：

Length	Encoded	Length	Decoded
2	`bGlnaHQgdw`	1	light w
3	`bGlnaHQgd28`	2	light wo
4	`bGlnaHQgd29y`	3	light wor

不同解码器对无填充的处理并不一致。此外，允许无填充解码会导致多个不同字符串解码成同一字节序列，存在潜在安全风险。