base32cx is a base-32 encoding with letter-case checksums inspired by Ethereum’s EIP55.
It is designed for encoding short byte string identifiers as human-skimmable strings, with use cases like file hashes or cryptocurrency addresses in mind.
The alphabet maximizes the number of alpha characters to increase the average number of checksum bits per string.
base32cx has a variant,
base32ux, which is the same alphabet without a checksum.
The unchecked variant
base32ux has the property that a lexical sort of encoded data is a bitwise sort of decoded data, like
This page (permalink) is the home of the specification.
An initial implementation is here.
(This post is undergoing a series of revisions as the spec is finalized. The latest revision was 2019-Nov-21.)
base32cx alphabet value: 0,1,2,3,4,5,[6..31] encoding: 4,5,6,7,8,9,[A..Z] lowered: 4,5,6,7,8,9,[a..z] value encoding value encoding value encoding value encoding ----- -------- ----- -------- ----- -------- ----- -------- 0 4 8 C (c) 16 K (k) 24 S (s) 1 5 9 D (d) 17 L (l) 25 T (t) 2 6 10 E (e) 18 M (m) 26 U (u) 3 7 11 F (f) 19 N (n) 27 V (v) 4 8 12 G (g) 20 O (o) 28 W (w) 5 9 13 H (h) 21 P (p) 29 X (x) 6 A (a) 14 I (i) 22 Q (q) 30 Y (y) 7 B (b) 15 J (j) 23 R (r) 31 Z (z)
The alphabet is selected so that:
- all alpha characters are present to maximize the probability of getting a check bit ‘hit’
base32hex, an ascii sort is a bitwise sort (true for
Uppercase letters are chosen for the unchecked variant because the numeric characters are “tall”. This makes unchecked data appear uniform while checked data appears mixed-height – see example section.
To checksum, take the sha256 of the bytes to be encoded. Call this hash
Encode the bytes using the alphabet above, like any other base32 alphabet without padding.
i‘th character of encoded string if the
(i % 256)‘th bit of
CHECK is a
Keep it uppercased if it is a
On average, this encoding gives
(26/32 alphas per alphabet) * (8/5 chars per byte) = 1.3 bits of checksum per byte.
base32cx is only defined for byte sequences up to length 2^20 - 1, that is, one byte less than 1 MiB.
It is most likely not an appropriate choice of encoding for larger data.
For completeness, a standard method for hashing large data and applying the checksum in chunks will be specified in the future.
Until then, base32cx is simply not defined if the data to be encoded is longer than 2^20 - 1 bytes.
encode("Hello") encoding result note base32cx d5mQSv7j appears mixed / passes checksum base32ux D5MQSV7j appears uniform / fails checksum none d5mqsv7j appears mixed / fails checksum, uppercased might be base32ux
These are generated from the first implementaion. Please check them with your own.
ascii : H base32cx : d4 ascii : He base32cx : d5MK ascii : Hel base32cx : D5MQs ascii : Hell base32cx : D5mqSV4 ascii : Hello base32cx : d5mQSv7j ascii : Hello! base32cx : d5MQsv7J88 ascii : 000011112222333344445555666677778888 base32cx : A4s74g5La8sn6glMact7AGtNagu7cH5oAoUneHdqasv7GhTrAwvnki5Sb4
base32cx as a
requires selecting a prefix that is not reserved in the table.
A candidate might be
X/x for ‘chECKSum’.
This section will be revised when a prefix set in stone.