Login


1
  Oracle
1
File Encryption/Decryption Tutorial
25 Feb 2015| mamunbssit

The goal of this article is to demostrate how one can perform encryption/decryption of data - and files in particular in C++. Throughout the article, we will touch on aspects such as encryption algorithm selection, reading files, choosing a padding scheme, and modes of operation.
Backgrounds

Encryption is the process of encoding data in such a way that only intended recipients can read the original data (also called the plaintext). The encoded data is called ciphertext. The process of retrieving plaintext from ciphertext is called decryption.

Encryption and decryption usually happen using a cryptographic key. Encryption/decryption schemes are generally divided into two classes: symmetric and assymetric algorithms. In symmetric algorithms, only a single key is used both for encryption and decryption. Whereas in assymetric algorithms, a public key is usually used for encryption and private key for decryption.

In this article, we will encrypt the files using a symmetric block cipher called Blowfish. In particular, we will borrow George Anescu's implementation of the Blowfish algorithm.

The fact that the algorithm is a block cipher as we mentioned means the following. The message which we want to encrypt is broken into blocks and each block is encrypted. In case of Blowfish, the length of the block is 8 bytes. Which means the message must be broken down to blocks with length of 8 bytes.
Reading Files

Like we said, this tutorial is about how to encrypt/decrypt files. Hence, we must read the file contents before processing them. Briefly, two options come to my mind when thinking about how to read files.

    To read the full file contents to memory - and encrypt/decrypt that chunk of data, or
    to read the file in smaller chunks (say 512 bytes), and process each chunk right away.

Clearly the issue with the first approach is that if the file is too large, it may become problematic to read that large file into memory, while the second approach does not suffer from this issue. Nevertheless, in this article, we implement the first method, and think it should not be a problem for the reader to change the project such that approach number 2 is used.
Padding Method

Like we mentioned, one of the important aspects when talking about encrypting/decrypting data is the padding scheme. As we noted earlier, since Blowfish has a block size of 8 bytes, the length of data that must be encrypted/decrypted must be multiple of 8 bytes.

So if some file we want to encrypt does not have size multiple of 8 bytes, we must pad it with additional bytes such that the length becomes multiples of 8. After decryption, we must remove those additional bytes.

For example, if the file we are willing to encrypt has a length of 12 bytes, we must pad it so that its length becomes 16 bytes. Now, we are left with plain text which has 16 bytes length. When we encrypt it, the ciphertext will also be 16 bytes in length. Now, if the user decrypts the ciphertext - she must be able to tell how many bytes are redundant. In other words, when reconstructing the plain text, she must be able to tell that only the first 12 bytes in our example represented the real data. The padding scheme called PKCS#5 (described here) that we will use can solve this issue.

PKCS#5 padding scheme works in the following way. The message say M is concatenated with padding string PS in the following way. PS consists of 8-(N mod 8) bytes where each byte has value 8-(N mod 8). Where N is the length of original message M. For example, if the original message is 14 bytes long, since block size of Blowfish is 8 bytes, the padding string will look like: PS = 02 02. If the original message is 12 bytes long on the other hand, the padding string will look like: PS = 04 04 04 04. Finally, if the padding string is already multiple of 8 bytes, the padding string will be: PS = 08 08 08 08 08 08 08 08.

Actually, I think it is not very hard to come up with a padding scheme yourself (although it is probably better to use an estabished scheme). For example, one padding scheme I came up with when thinking about this issue was the following. Regardless of whether the original plaintext length is multiple of 8, always append a dummy block, last byte of which says how many bytes were needed for padding. For example, if the original message is 12 bytes long, the padded string may look like:
Copy Code

Final Message = OriginalMessage || 00 00 00 00 || 00 00 00 00 00 00 00 04.

You can see 4 bytes with value 00 were appended to the message since the message length was 12 (16-12=4), and the last dummy block, last byte of which contains the number (4) of bytes that was added. Now, after the decryption, the user could read the last byte, discard the last block, and also discard as many bytes from the remaining message, as were mentioned in the last byte of padded block.

Nevertheless, in this article, we will use the PKCS#5 padding scheme. In this scheme, in order to reconstruct the plain text, the user will read the last byte, and discard as many bytes as are specified by that byte. Some additional checks can also be performed as it can be seen in code.
Mode of Operation

Finally, few words about the mode of operation. As we noted the length of the plaintext/ciphertext which must be passed to the encryption/decryption method must be multiple of 8 bytes - due to the block length of Blowfish cipher which is 8 bytes. Now, a mode of operation describes how to apply the cipher to data which consists of multiple 8 byte blocks. For example, the ECB (Electronic Codebook) mode, processes each block in parallel. Below on image is shown how ECB mode looks like. Each "plaintext" block in the image refers to a single 8 byte block of the plain text. You can see how they are processed in parallel. Same principle applies during decryption.

There are more secure versions than ECB, for example Cipher Block Chaining (CBC) mode. Interested reader is referred here to read more details about modes of operations. It should be noted that in this article we use ECB mode for encryption and decryption. If user wishes to use more secure mode of operation for example CBC, we refer the reader to George Anescu's implementation of the Blowfish cipher (the one we used) to see how that can be realized: link.

Your Comments



  Info
Views 100
First Posted :2/25/2015