In computer programming, Base64 is a group of binary‑to‑text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique symbols. The source binary data is processed 6 bits at a time; each 6‑bit group is mapped to one of the 64 characters.
Like other binary‑to‑text encodings, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. It is particularly prevalent on the web, where it enables embedding of image files or other binary assets inside textual assets such as HTML and CSS.
Base64 is also widely used for sending e‑mail attachments, because the original SMTP protocol was designed to transport 7‑bit ASCII only. Encoding an attachment as Base64 before sending—and decoding when received—ensures older SMTP servers will not interfere with the attachment.
Base64 encoding introduces size overhead of roughly 33%–37% relative to the original binary data (about 33% from the encoding itself, up to ~4% more by optional inserted line breaks).
The particular set of 64 characters chosen to represent the 64‑digit values for the base varies between implementations. The general strategy is to choose characters that are common to most encodings and are printable, making the data unlikely to be modified in transit through systems (such as email) that were traditionally not 8‑bit clean. For example, MIME’s Base64 uses A–Z, a–z and 0–9 for the first 62 values; other variants share this property but differ in the symbols chosen for the last two values (e.g., UTF‑7).
The earliest instances of this encoding family were created for dial‑up communication between systems running the same OS — for example, uuencode for UNIX and BinHex for the TRS‑80 (later adapted for the Macintosh). These made assumptions about what characters were safe to use; for instance, uuencode uses uppercase letters, digits and many punctuation characters, but no lowercase.
Index | Binary | Char. | Index | Binary | Char. | Index | Binary | Char. | Index | Binary | Char. |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 000000 | A | 16 | 010000 | Q | 32 | 100000 | g | 48 | 110000 | w |
1 | 000001 | B | 17 | 010001 | R | 33 | 100001 | h | 49 | 110001 | x |
2 | 000010 | C | 18 | 010010 | S | 34 | 100010 | i | 50 | 110010 | y |
3 | 000011 | D | 19 | 010011 | T | 35 | 100011 | j | 51 | 110011 | z |
4 | 000100 | E | 20 | 010100 | U | 36 | 100100 | k | 52 | 110100 | 0 |
5 | 000101 | F | 21 | 010101 | V | 37 | 100101 | l | 53 | 110101 | 1 |
6 | 000110 | G | 22 | 010110 | W | 38 | 100110 | m | 54 | 110110 | 2 |
7 | 000111 | H | 23 | 010111 | X | 39 | 100111 | n | 55 | 110111 | 3 |
8 | 001000 | I | 24 | 011000 | Y | 40 | 101000 | o | 56 | 111000 | 4 |
9 | 001001 | J | 25 | 011001 | Z | 41 | 101001 | p | 57 | 111001 | 5 |
10 | 001010 | K | 26 | 011010 | a | 42 | 101010 | q | 58 | 111010 | 6 |
11 | 001011 | L | 27 | 011011 | b | 43 | 101011 | r | 59 | 111011 | 7 |
12 | 001100 | M | 28 | 011100 | c | 44 | 101100 | s | 60 | 111100 | 8 |
13 | 001101 | N | 29 | 011101 | d | 45 | 101101 | t | 61 | 111101 | 9 |
14 | 001110 | O | 30 | 011110 | e | 46 | 101110 | u | 62 | 111110 | + |
15 | 001111 | P | 31 | 011111 | f | 47 | 101111 | v | 63 | 111111 | / |
Padding | = |
In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the byte values 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values. As this example illustrates, Base64 encoding converts three octets into four encoded characters.
Source ASCII text | M | a | n |
---|---|---|---|
Character | M | a | n |
Octets | 77 (0x4d) | 97 (0x61) | 110 (0x6e) |
Bits | 0 1 0 0 1 1 0 1 | 0 1 1 0 0 0 0 1 | 0 1 1 0 1 1 1 0 |
Base64 encoded Sextets |
19 | 22 | 5 46 |
Base64 encoded Character |
T | W | F u |
Base64 encoded Octets |
84 (0x54) | 87 (0x57) | 70 (0x46) 117 (0x75) |
= padding characters might be added to make the last encoded block contain four Base64 characters.
Hexadecimal to octal transformation is useful to convert between binary and Base64. For example, the hexadecimal representation of the 24 bits above is 4D616E, whose octal is 23260556. Split into pairs (23 26 05 56) and map each to decimal (19 22 05 46); using those four decimal numbers as indices for the Base64 alphabet yields the ASCII characters TWFu.
If there are only two significant input octets (e.g., "Ma"), or when the last input group contains only two octets, all 16 bits are captured in the first three Base64 digits (18 bits); the two least significant bits of the last 6‑bit block will be zero and are discarded on decoding (along with the succeeding = padding character).
Source ASCII text | M | a | |
---|---|---|---|
Character | M | a | |
Octets | 77 (0x4d) | 97 (0x61) | |
Bits | 0 1 0 0 1 1 0 1 | 0 1 1 0 0 0 0 1 | 0 0 |
Base64 encoded Sextets |
19 | 22 | 4 |
Base64 encoded Character |
T | W | E |
Base64 encoded Octets |
84 (0x54) | 87 (0x57) | 69 (0x45) |
Because Base64 is a six‑bit encoding, and because the decoded values are divided into 8‑bit octets, every four characters of Base64‑encoded text (4 sextets = 4 × 6 = 24 bits) represent three octets of unencoded text or data (3 octets = 3 × 8 = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is =
, which indicates that no further bits are needed to fully encode the input.
The example below illustrates how truncating the input of the quote changes the output padding:
Input | Output | Padding | ||
---|---|---|---|---|
Text | Length | Text | Length | |
light work. | 11 | bGlnaHQgd29yay4= |
16 | 1 |
light work | 10 | bGlnaHQgd29yaw== |
16 | 2 |
light wor | 9 | bGlnaHQgd29y |
12 | 0 |
light wo | 8 | bGlnaHQgd28= |
12 | 1 |
light w | 7 | bGlnaHQgdw== |
12 | 2 |
The padding character is not essential for decoding, since the missing bytes can be inferred from the length of the encoded text. Some implementations require padding; others do not. A common exception where padding is required is when multiple Base64‑encoded files have been concatenated.
When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single =
indicates that the four characters will decode to only two bytes, while ==
indicates that the four characters will decode to only a single byte. For example:
Encoded | Padding | Length | Decoded |
---|---|---|---|
bGlnaHQgdw== |
== | 1 | light w |
bGlnaHQgd28= |
= | 2 | light wo |
bGlnaHQgd29y |
None | 3 | light wor |
Another way to interpret the padding character is to consider it as an instruction to discard trailing bits from the bit string each time a =
is encountered. For example, when bGlnaHQgdw==
is decoded, we convert each character (except the trailing =
signs) into its 6‑bit representation, then discard 2 trailing bits for the first =
and another 2 trailing bits for the other =
. In this instance, we would get 6 bits from the d
, and another 6 bits from the w
for a bit string of length 12; removing 2 bits for each =
(total 4 bits) leaves 8 bits (1 byte) when decoded.
Without padding, after normal decoding of four characters to three bytes repeatedly, fewer than four encoded characters may remain. In this situation, only two or three characters can remain. A single remaining encoded character is not possible, because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so at least two Base64 characters are required: the first contributes 6 bits, and the second contributes its first 2 bits. For example:
Length | Encoded | Length | Decoded |
---|---|---|---|
2 | bGlnaHQgdw |
1 | light w |
3 | bGlnaHQgd28 |
2 | light wo |
4 | bGlnaHQgd29y |
3 | light wor |
Decoding without padding is not performed consistently among decoders. In addition, allowing padless decoding by definition allows multiple strings to decode into the same set of bytes, which can be a security risk.