Base64

In computer programming, Base64 is a group of binary‑to‑text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique symbols. The source binary data is processed 6 bits at a time; each 6‑bit group is mapped to one of the 64 characters.

Like other binary‑to‑text encodings, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. It is particularly prevalent on the web, where it enables embedding of image files or other binary assets inside textual assets such as HTML and CSS.

Base64 is also widely used for sending e‑mail attachments, because the original SMTP protocol was designed to transport 7‑bit ASCII only. Encoding an attachment as Base64 before sending—and decoding when received—ensures older SMTP servers will not interfere with the attachment.

Base64 encoding introduces size overhead of roughly 33%–37% relative to the original binary data (about 33% from the encoding itself, up to ~4% more by optional inserted line breaks).

Design

The particular set of 64 characters chosen to represent the 64‑digit values for the base varies between implementations. The general strategy is to choose characters that are common to most encodings and are printable, making the data unlikely to be modified in transit through systems (such as email) that were traditionally not 8‑bit clean. For example, MIME’s Base64 uses A–Z, a–z and 0–9 for the first 62 values; other variants share this property but differ in the symbols chosen for the last two values (e.g., UTF‑7).

The earliest instances of this encoding family were created for dial‑up communication between systems running the same OS — for example, uuencode for UNIX and BinHex for the TRS‑80 (later adapted for the Macintosh). These made assumptions about what characters were safe to use; for instance, uuencode uses uppercase letters, digits and many punctuation characters, but no lowercase.

Base64 table from RFC 4648

Index Binary Char. Index Binary Char. Index Binary Char. Index Binary Char.
0000000A16010000Q32100000g48110000w
1000001B17010001R33100001h49110001x
2000010C18010010S34100010i50110010y
3000011D19010011T35100011j51110011z
4000100E20010100U36100100k521101000
5000101F21010101V37100101l531101011
6000110G22010110W38100110m541101102
7000111H23010111X39100111n551101113
8001000I24011000Y40101000o561110004
9001001J25011001Z41101001p571110015
10001010K26011010a42101010q581110106
11001011L27011011b43101011r591110117
12001100M28011100c44101100s601111008
13001101N29011101d45101101t611111019
14001110O30011110e46101110u62111110+
15001111P31011111f47101111v63111111/
Padding =

Examples

In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the byte values 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values. As this example illustrates, Base64 encoding converts three octets into four encoded characters.

Encoding of the source string (Man) in Base64

Source ASCII text M a n
Character M a n
Octets 77 (0x4d) 97 (0x61) 110 (0x6e)
Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
Base64 encoded
Sextets
19 22 5   46
Base64 encoded
Character
T W F   u
Base64 encoded
Octets
84 (0x54) 87 (0x57) 70 (0x46)   117 (0x75)

= padding characters might be added to make the last encoded block contain four Base64 characters.

Hexadecimal to octal transformation is useful to convert between binary and Base64. For example, the hexadecimal representation of the 24 bits above is 4D616E, whose octal is 23260556. Split into pairs (23 26 05 56) and map each to decimal (19 22 05 46); using those four decimal numbers as indices for the Base64 alphabet yields the ASCII characters TWFu.

If there are only two significant input octets (e.g., "Ma"), or when the last input group contains only two octets, all 16 bits are captured in the first three Base64 digits (18 bits); the two least significant bits of the last 6‑bit block will be zero and are discarded on decoding (along with the succeeding = padding character).

Source ASCII text M a  
Character M a  
Octets 77 (0x4d) 97 (0x61)  
Bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0
Base64 encoded
Sextets
19 22 4
Base64 encoded
Character
T W E
Base64 encoded
Octets
84 (0x54) 87 (0x57) 69 (0x45)

Output padding

Because Base64 is a six‑bit encoding, and because the decoded values are divided into 8‑bit octets, every four characters of Base64‑encoded text (4 sextets = 4 × 6 = 24 bits) represent three octets of unencoded text or data (3 octets = 3 × 8 = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is =, which indicates that no further bits are needed to fully encode the input.

The example below illustrates how truncating the input of the quote changes the output padding:

Input Output Padding
Text Length Text Length  
light work. 11 bGlnaHQgd29yay4= 16 1
light work 10 bGlnaHQgd29yaw== 16 2
light wor 9 bGlnaHQgd29y 12 0
light wo 8 bGlnaHQgd28= 12 1
light w 7 bGlnaHQgdw== 12 2

The padding character is not essential for decoding, since the missing bytes can be inferred from the length of the encoded text. Some implementations require padding; others do not. A common exception where padding is required is when multiple Base64‑encoded files have been concatenated.

Decoding Base64 with padding

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single = indicates that the four characters will decode to only two bytes, while == indicates that the four characters will decode to only a single byte. For example:

Encoded Padding Length Decoded
bGlnaHQgdw== == 1 light w
bGlnaHQgd28= = 2 light wo
bGlnaHQgd29y None 3 light wor

Another way to interpret the padding character is to consider it as an instruction to discard trailing bits from the bit string each time a = is encountered. For example, when bGlnaHQgdw== is decoded, we convert each character (except the trailing = signs) into its 6‑bit representation, then discard 2 trailing bits for the first = and another 2 trailing bits for the other =. In this instance, we would get 6 bits from the d, and another 6 bits from the w for a bit string of length 12; removing 2 bits for each = (total 4 bits) leaves 8 bits (1 byte) when decoded.

Decoding Base64 without padding

Without padding, after normal decoding of four characters to three bytes repeatedly, fewer than four encoded characters may remain. In this situation, only two or three characters can remain. A single remaining encoded character is not possible, because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so at least two Base64 characters are required: the first contributes 6 bits, and the second contributes its first 2 bits. For example:

Length Encoded Length Decoded
2 bGlnaHQgdw 1 light w
3 bGlnaHQgd28 2 light wo
4 bGlnaHQgd29y 3 light wor

Decoding without padding is not performed consistently among decoders. In addition, allowing padless decoding by definition allows multiple strings to decode into the same set of bytes, which can be a security risk.