Posted on 23 Apr 2014

Encoding/Decoding Base64

Intro

The other day I made a post on bitwise operators, and since then I wrote up an implementation of encoding/decoding a hex string to base64 and back. To do this I used a good amount of stuff from the bitwise post, so if you aren't up to speed on that I suggest you read over it.

What is Base64?

Base64 gives us a way to represent binary data as textual data. This binary data can be anything from an image, paragraph of text, a song, etc. There are a number of reasons you may want to encode something with Base64, such as for transmitting binary data over a network, simple obfuscation, cryptography, and more. For a full explanation of Base64 check out it's wikipedia page.

Encoding Hex to Base64

To get started, we are going to go over encoding a hex string to Base64. To do this you have to keep a couple of things in mind.

  • Each character in a hex string has 4 significant bits.
  • Each character in a base64 string has 6 significant bits.
  • In this post we are doing MIME base64 encoding, thus using '+' and '/'.

Doing a little math we know that the least common multiple of 4 and 6 is 12. So therefore, for every 3 hex characters we are going to get 2 base64 characters. In order to do this we are going to utilize some bit shifting as well as bitwise AND and OR. With all of that in mind, lets take a look at the code!

static const string b64EncodeLookup{"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"};

string b64_encode_hex(const unsigned char *data, size_t size) {
    string encodedStr{};

    // Determine # of b64 sets expected - 12 bits per.
    size_t sets = size / 3;
    // Determine padding required.
    size_t pad = size % 3;
    // pad must me 0 or 1.
    if(pad == 2) {
        return encodedStr;
    }
    // Determine how many characters should be encoded.
    size_t count = sets * 3;

    encodedStr.reserve(sets*2+2);

    // We may need these if there is padding.
    unsigned long long h = 0;
    size_t i = 0;

    // Encode data 3 hex characters(12bits) at a time.
    for(i = 0; i < count; i += 3) {
        // Grab copies of the next 3 characters.
        char a = data[i];
        char b = data[i+1];
        char c = data[i+2];
        // Convert the hex characters to their values, and merge them all into a single variable.
        h = strtoull(&a, nullptr, 16) << 8;
        h |= strtoull(&b, nullptr, 16) << 4;
        h |= strtoull(&c, nullptr, 16);
        // Lookup corresponding base64 character and add to encoded string.
        encodedStr.push_back(b64EncodeLookup[0x3f & (h >> 6)]);
        encodedStr.push_back(b64EncodeLookup[0x3f & h]);
    }

    // Check if padding is required.
    if(pad == 1) {
        // Repeat above process, expect for a single character.
        char a = data[i];
        h = strtoull(&a, nullptr, 16) << 8;
        encodedStr.push_back(b64EncodeLookup[0x3f & (h >> 6)]);
        // Pad with a single '=' character.
        encodedStr.push_back('=');
    }

    return encodedStr;
}

If you understand bit manipulations and how hex numbers work then this shouldn't blow your mind. If you don't, then check out my post on it and then comeback! It will probably make more sense. Something to note is that you can encode ascii strings as well. The way I would do it is to just convert your ascii string to a hex string, and use this function.

Decoding Base64 to Hex

So we encoded the hex string to base64, but what if we want to get back our original hex string? Well that is what we are about to see! I took a different approach to the decoding by not using a lookup table for the base64 characters. I did this for 2 reasons; the speed and efficiency improvements of this compared to doing searches in a table/container, and I just wasn't a fan of the table/container. The comments should explain any semi-confusing areas pretty well.

static const string hex_digits{"0123456789abcdef"};

string b64_decode_hex(const unsigned char *str, size_t size) {
    string decoded{};

    // Check if valid base64 string.
    if(size % 2 != 0) {
        return decoded;
    }
    // Determine # of hex sets expected.
    size_t sets = size /  2;
    size_t end = sets * 2;

    decoded.reserve(sets*2.5);

    unsigned long long h = 0;

    // Decode 2 base64 characters(12bits) at a time.
     for(size_t i = 0; i < end; i += 2) {
        for(size_t s = 0; s < 2; ++s) {
            h <<= 6;
            // Determine if we are working with A-Z, a-z, 0-9, +, /, or =.
            if(str[i+s] >= 0x41 && str[i+s] <= 0x5a) {
                // Determine correct base64 value for A-Z characters and add to progress.
                h |= str[i+s] - 0x41;
            } else if(str[i+s] >= 0x61 && str[i+s] <= 0x7a) {
                // Determine correct base64 value for a-z characters and add to progress.
                h |= str[i+s] - 0x47;
            } else if(str[i+s] >= 0x30 && str[i+s] <= 0x39) {
                // Determine correct base64 value for 0-9 characters and add to progress.
                h |= str[i+s] + 0x04;
            } else if(str[i+s] == 0x2b) {
                // Add + to progress.
                h |= 0x3e;
            } else if(str[i+s] == 0x2f) {
                // Add / to progress.
                h |= 0x3f;
            } else if(str[i+s] == 0x3d) {
                // Handle pad character '=' if there is only 1.
                if(end - (i+s) == 1) {
                    decoded.push_back(hex_digits[0xf & (h >> 8)]);
                    return decoded;
                } else {
                    // Invalid b64 string
                    return {};
                }
            }
        }
        // Lookup hex characters to use and add to decoded string.
        decoded.push_back(hex_digits[0xf & (h >> 8)]);
        decoded.push_back(hex_digits[0xf & (h >> 4)]);
        decoded.push_back(hex_digits[0xf & h]);
    }

    return decoded;
}

Let's test it out!

The following is some test code I used to ensure things are working as intended. Feel free to open up your browser and search for an online encoder/decoder and compare the results. Remember this is Hex to Base64 so make sure that is the type of encoder/decoder you use.

int main() {
    const char *test1 = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d";
    string expectedResult {"SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t"};

    string res = b64_encode_hex(reinterpret_cast<const unsigned char*>(test1), strlen(test1));
    assert(res == expectedResult);
    res = b64_decode_hex(reinterpret_cast<const unsigned char *>(res.c_str()), res.length());
    assert(res == string{test1});

    const char *test2 = "4927";
    expectedResult = "SSc=";

    res = b64_encode_hex(reinterpret_cast<const unsigned char*>(test2), strlen(test2));
    assert(res == expectedResult);
    res = b64_decode_hex(reinterpret_cast<const unsigned char *>(res.c_str()), res.length());
    assert(res == string{test2});

    const char *test3 = "1c0111001f010100061a024b53535009181c";
    expectedResult = "HAERAB8BAQAGGgJLU1NQCRgc";

    res = b64_encode_hex(reinterpret_cast<const unsigned char*>(test3), strlen(test3));
    assert(res == expectedResult);
    res = b64_decode_hex(reinterpret_cast<const unsigned char *>(res.c_str()), res.length());
    assert(res == string{test3});

    const char *test4 = "616e79206361726e616c20706c6561737572652e";
    expectedResult = "YW55IGNhcm5hbCBwbGVhc3VyZS4=";

    res = b64_encode_hex(reinterpret_cast<const unsigned char*>(test4), strlen(test4));
    assert(res == expectedResult);
    res = b64_decode_hex(reinterpret_cast<const unsigned char *>(res.c_str()), res.length());
    assert(res == string{test4});

    cout << "Everything encoded/decoded successfully!\n";

    std::cin.get();
    return 0;
}

It's a wrap!

That wasn't a long post, but I think it has some good information in it for those interested in this sort of thing. I don't have a timeframe, but at some point I plan to go over some basic cryptography stuff utilizing both this post and the bitwise op post, so if that piques your interest check back for that! If you have any questions or feedback please feel free to leave a comment or shoot me a message via one of the social links on my site.