 Generalized Tiny Encryption Algorithm
Based on the work of David J. Wheeler and Roger M. Needham

Tom St Denis
April 20th 1999

Abstract. This paper proposes extensions and variations to the proposed TEA algorithm, while maintaining the same objectives as the original TEA algorithm.

Introduction

In the original paper that describes TEA , the authors left out one crucial aspect of the algorithm, which lead to related key attacks . Although the algorithm was still effective the authors proposed X-TEA . They believed that this new algorithm was more resistant to the related key attack.

The goal of Generalized Tiny Encryption Algorithm (GTEA) is to define and explore a family of algorithms based on the basic TEA algorithm. In this paper, variations will be explored and documented with reference to cryptanalysis and levels of diffusion/confusion.

To avoid confusion of the terminology, all numbers are in decimal, except hexadecimal which are denoted by a 0x. Function names are all in bold, and variables are not in bold.

Algorithm

The basic form for GTEA is as follows

sum = 0

for r = 1 to rounds
P1 = P1 + P(P2, sum) + S(K1K2, P2)
P2 = P2 + P(P1, sum) + S(K3K4, P1)

sum = sum + 0x9E3779B9

K = RLn(K)
next r

Note: The addition to the right of the P function can be replaced with XOR as needed. The constant register can be updated between rounds or after a pair of rounds.

Where the following functions and variables are used

P Plaintext, divided into two halves, P1 and P2.
K Key, divided into four parts, K1 ... K4
S Substitution, comprises of the non-linear step, which mixes in the key material.
P Permutation, used for diffusion of bit errors.
RL Optional key rotation (cyclic bit rotation)
n is the rotation amount.

sum A simple linear sequence used to avoid degenerative cycles. The value 0x9E3779B9 was taken from golden number ( )

The decryption is the same, except you use subtraction as the first operator, instead of addition. The idea is that the mixed addition and xor operations are non-linear. (One proof is that addition is in the GF(2w) and xor is in the GF(2), which do not commute).

Despite it's simplicity, the algorithm in theory can be made quite resistant to known attacks. This requires the use of properly created functions which are resistant to differential and linear attacks. We can describe TEA , using this model describing the functions P and S (RL is omitted)

TEA

For P1
P
(x, sum) = x + sum
S(K1K2, x) = (SL(x, 4) + K1) xor (SR(x, 5) + K2)

For P2
P(x, sum) = x + sum
S(K3K4, x) = (SL(x, 4) + K3) xor (SR(x, 5) + K4)

X-TEA

For P1
P
(x,sum) = (SL(x, 4) xor SR(x, 5)) + (x xor sum)
S(K,sum) = Ksum mod 4

For P2
P(x,sum) = (SL(x, 4) xor SR(x, 5)) + (x xor sum)
S(K,sum) = K(sum / 2048) mod 4

Note: In X-TEA the sum register is modified between the odd and even rounds.
Where SR denotes logical bit shift right, and SL denotes logical bit shift left.

Both algorithm feature similar diffusion techniques. They use a feedback function similar to

F(x) = SL(x, 4) xor SR(x, 5)

Which is used to diffuse bit errors in the register. This called a feedback because when the wrong values enter at the first round of the iteration, they will cause errors in the destination register, which in turn is the input for the next round. This turns out to be quite effective in causing diffusion of bit errors. It requires at least five rounds (not iterations) to cause at least 32 bit errors. Although this level of diffusion (50% of the block) is not always achieved so quickly,and that is why the authors of TEA have suggested a bare minimum of 8 rounds (4 iterations) for at least that much diffusion.

The confusion sequence is different in both algorithms. In the original TEA, the odd rounds used K1 and K2, where even rounds used K3 and K4. This produces bad related-key properties that  have exploited. The best attack  have reported is 223 chosen plaintext, one related-key query and roughly 232 offline computations. This may seem rather impractical but is still feasible, given sufficient time. Also the effective key length of any TEA key is 126 bits since there is a nice property relating bit 30 of K3 and K4.

In X-TEA the authors have corrected the shorten key problem, but their key schedule is not much better. They have used the sum register has an index for the key entries. While it is improvement of TEA, there are still known practical problems. The first is that the key pattern has a short cycle (4 rounds for the odd rounds, and about 24 for the even rounds). This limits the total number of rounds to 96, which will not effect most implementations. The key material is introduced slower which helps prevent the related bit attacks possible on TEA since both related key registers are not used in the same round. Overall X-TEA is an improvement over the original and still keeps the same design properties.

The choice of the sum delta value and it's bits in the index were not arbitrary. The delta is used to avoid degenerative rounds where patterns can appear, which is due to weak keys. The key of all zero for example is considered weak, if there is no sum register. The sum register basically maintains that all the bits of the result change with a probability of 1/2 in every round. This provides for nice statistical results.

Improvements

Although X-TEA is generally a safe algorithm, with good practical design, it still has similar differential properties as TEA. This is because although the order of the key registers changes in each round, the actual values do not. Which means that the same bits in the subkeys will affect the same bits in the output with a probability of about 1/3. Given enough chosen plaintexts (how much?) this can exploited, which requires constructing enough pairs (of ciphertext/plaintext) to reliably approximate what the bit of the key would be.

A simple correction for this could be

P1 S(K, sum, r) = RL(Ksum mod 4, (r mod 32))
P2 S(K, sum, r) = RL(K(sum / 2048) mod 4, (r + 1 mod 32))

Where RL denotes cyclic bit rotation left.

Where r is the current iteration number in the round function. The even round uses 'r + 1' to help deter similar keys from being used in any round by a factor of 248 (1/16 * 1/232, given as a 1/16 chance of picking the pair and a 1/232 chance that this pairing is similar). Given at least 64 iterations every bit of the subkeys will effect every bit of the output. However this level is probably not required, as a natural extension the subkeys could be saved after encrypting. Assuming that each key is used with an exact probability of 1/4 then after 32 rounds (16 iterations) each key is rotated exactly 8 bits. After 4 blocks, the cycle begins again, and the same keys will enter the encryption/decryption function.

This model will have a maximum (after 128 rounds) of 128 scheduled keys. They will be used in a fixed order too.

The above can be extended similar to  using data dependant rotations on the subkeys.

TEA-N1

P1 S(K, sum, P2) = RL(Ksum mod 4, (P2 mod 32))
P2 S(K, sum, P1) = RL(K(sum / 2048) mod 4, (P1 mod 32))

This will allow 'random' rotations of the subkeys, controlled by input plaintext register. The idea is that each different plaintext block will produce different sets of scheduled keys during encryption. There are essentially 32*4 or 128 scheduled keys, which have no preset order of use. Assuming that the probability of each bit in the plaintext registers is 1/2, then each rotation amount will have a probability of 1/32 (or 1/2 * 1/2 * 1/2 * 1/2 * 1/2 for each of the five bits). Which extends to the actual keys (given each key has a probability of 1/4 of being used) then each 'scheduled key' has a probability of 1/4 * 1/32 or 1/128. This extension is actually quite efficient on most computers (such as x86 systems), which support rotations directly.

The security of this design requires hiding the rotation amounts used. If the rotation amounts can be obtained then there are essentially four scheduled keys. To properly mask the rotations amounts pre-whitening and post-whitening are required, which is pretty easy to implement. Using K1/K3 for the post-white (addition would suffice) and K2/K4 for the pre-white (xor would suffice) would be effective enough. Using this technique an attacker would have a 1/64 chance of getting both rotations right for the first two rounds. A 1/4096 chance for four rounds, and so on (basically ).

A nice benefit of this is that although there are 128 scheduled keys, none of which have to be stored in memory (except for the four original sub keys). This indicates a good use of memory while extending security.

This idea can be extended to provide more scheduled keys. You could create extra subkeys by deriving them from the original key. It must be noted this does not complicate a brute force search.

A = K1 xor K2
B = K3 xor K4
C = K1 xor K4
D = K2 xor K3
Alternating addition and binary exclusive or can be used to avoid collisions. With a combined probability of collision of 1/2
96 (or 1/232 per pair).

Each extra subkey are independent of each other. This is hardly one way, but provides four extra sub keys. You could then extend TEA-N1 to use these new keys assuming they have been put in array called NK.

TEA-N2

P1 S(NK, K, sum, P2)= RL(NK(sum / 2048) mod 4, SR(P2, 5) mod 32) xor
RL(Ksum mod 4, (P2 mod 32))

P2 S(NK, K, sum, P1)= RL(NKsum mod 4, SR(P1, 5) mod 32) xor
RL(K(sum / 2048) mod 4, (P1 mod 32))

This extension is quite a bit slower then the previous. Assuming that each value in NK has a probability of 1/232, then all four NK sub keys have a 1/296 chance of being equal (or satisfying any statistical problem) which is extremely low. Assuming that the entries are unique, there are 8*32 or 256 scheduled keys. Using the same assumptions and proofs as in TEA-N1 each scheduled key would have a probability of 1/256.

The major draw back is that there are two rotations per round, double that of TEA-N1. Which means that it will be slower, and take up more code space. On desktop computers this is hardly a concern, similarly  has two rotations per round and it is considered quite practical.

The benefit of TEA-N2 over TEA-N1 is that there are more scheduled keys. Although four of them are derived from the other four, they are considered unique. There is a chance that all, or a subset of the scheduled keys are equal. The chances of one pair of scheduled keys being equal is 1/232, which is quite low.

A natural extension would be to create the NK set from actual key entries, which would make collisions nonexistent, and extend the key from 128 bits to 256 bits. This is also quite practical given the design.

Another design criteria for TEA-N2 is that the key material must be mixed separately. Which requires mixing the S and P functions. This is required because the keys used in the S function would have complements reducing the complexity of the scheduled key pair. Whether or not this type of attack is feasible has not been determined, it is suggested to avoid this style of algorithm.

Another possible extension to TEA-N2 would be to use the top and bottom five bits for the key schedule. This would statistically mean that errors in the key schedule would occur more evenly, and more often. This extension is probably a good idea.

The pseudo code would resemble TEA, where two scheduled keys are used per round. The scheduled keys are derived as described in TEA-N2, basically split the equation where the xor is, and use the left and right halves as two scheduled keys. The pseudo code resembles

Pseudo Code for TEA-N2

P1 = P1 + K1
P2 = P2 + K3

sum = 0x9E3779B9
for r = 1 to rounds
P1 = P1 + (SL(P2, 4) + SK1) xor (P2 + sum) xor (SR(P2, 5) + SK2)
sum = sum + 0x9E3779B9
P2 = P2 + (SL(P1, 4) + SK3) xor (P1 + sum) xor (SR(P1, 5) + SK4)
next r

P1 = P1 xor K2
P2 = P2 xor K4

Where SK is the derived scheduled key for the round. There are two scheduled keys per round.

In this implementation there is no simple complement problem as in TEA-N2. Each scheduled key is mixed with the permutation function to provide a higher order of diffusion and isolation.

Key Rotation

Yet another extension to this would be to rotate the entire key after every two rounds. This idea could be aplied to any of the GTEA algorithms.

This idea would also slow down the process, but provide for up to 128*128 (in the case of TEA-N1) or 128*256 (TEA-N2) scheduled keys. This is because when you rotate the entire key (not just the individual words), there are 128 states (given the key is 128 bits) if the entire key is rotated by a odd amount (1, 5, 11 or 25 for example). There are two alternatives, the first is you can implement this in assembler and get direct access to the rotate operator. The second more practical (but unlike TEA) would be to precalculate the rotate key (in all 128 states).

In subsequent blocks you would use different states of the key. For example if you have 32 iterations (64 rounds) after four encrypted blocks you would start to use the same key state. A better idea would to use a number of iterations that does not evenly divide 128. In this manner the cycle will not always begin at the start or end of a block (for example 26 rounds would ensure that the key state is different in each subsequent block). Key rotation can also be easily to the original TEA and X-TEA algorithms.

This extension would provide more scheduled keys, and would be more resistant to differential analysis. The individual scheduled keys would also be more complicated and harder to analyze. If security is of a high concern, studying TEA-N1 or TEA-N2 with the key rotation is suggested.

Still other areas of improvement include using a more irregular order for the subkeys. Currently the (sum mod 4) will always be in order of '1, 2, 3, 4, ...'. While it is not required (at all, and is really not needed if you rotate the entire key), it may be an area of interest for future development.

Benefits

The benefits of these modifications apply largely to differential analysis. The first prerequisite is that each bit of the key effects at least one bit of the plaintext with a probability of 1/2. The key is twice as large as plaintext which means that two bits of the key effect one bit of the plaintext with a probability of 1/4 each (1/2 total). Also the order of the key bits are different (and even pseudo-random) in each round, making the search for correlations (between bits in a ciphertext/plaintext pair with a key) harder to find.

Having the key words (and the key as a whole) being rotated is a quite natural method to have such a nice influence which is what this paper has explored.

TEA-N1 was designed to compliment X-TEA by using the data dependant rotations. This variation is quite natural when compared to the original algorithm . It features a variable key schedule which is also pseudo-random (with respect to the plaintext). The key material is also introduced slowly with the same design criteria as X-TEA.

TEA-N2 was designed to compliment TEA by using extended keys (can be a 256 bit key or a 128 bit key with expansions). When used with a 128 bit key it features better results with respect to differential analysis, as various bits of the key (as a whole) have been changed to form another subset key. This algorithm is slower then TEA-N2, but most likely more secure.

Both algorithms require at least five rounds (2.5 iterations) to complete diffusion of a single bit error (in the very first round). It is assumed that at least eight to sixteen rounds are required to strengthen this effect, this is because single bit errors can either propagate zero, one or two error bits. The most common result would be that a single bit error would propagate two error bits for the next round. Each of these errors would be about 9 bits away, which means that a single bit error requires at most seven rounds to effect the key schedule of TEA-N1 (or six rounds for TEA-N2).

Overall both proposals are for further study only. While they are based on good theory, they require further study, and research to back up the claims made. A promising area of research would be an actual key rotation. As described key rotation could be used for file encryption or a communication link, and would make electronic code book mode a valuable tool.

Conclusion

By using a stronger, and more efficient key schedule, the attacks proposed against TEA and X-TEA can eliminated. At the same time an extended 256 bit version (TEA-N2) has been designed with the same criteria of the original TEA cipher. This paper was intended as a insight to the GTEA family, and will possibly lead to new designs and ciphers.

References

 TEA, a Tiny Encryption Algorithm, David J. Wheeler & Roger M. Needham.

 X-TEA, Tea Extensions, David J. Wheeler & Roger M. Needham.

 Related-Key Cryptanalysis of 3-WAY, Biham-DES, CAST, DES-X, NewDes, RC2 and TEA, John Kelsey, Bruce Schneier & David Wagner.

 The RC5 Encryption Algorithm, March 20th 1997, Ronald L. Rivest.

 The RC6 Block Cipher, A submission to AES, Ronald L. Rivest, M.J.B Robshaw, R. Sidneyt & Y.L. Yin.

Appendix A

Provided for education use only, is the C source code for all proposed algorithms in this paper. This includes TEA, X-TEA, TEA-N1. TEA-N2 pseudo-code was presented, and therefore there is no source. Only the encryption functions are provided, as the decryption functions can be derived from these (they are just the inverse).

/* Tea related source code */

#define DELTA 0x9E3779B9UL
#define ROUNDS 64

/* macro to perform cyclic bit rotations left */
#define rol(x, y) ( ((x) << ((y) & 31)) | ((x) >> (32 - ((y) & 31))) )

/* Original TEA */
void tea(unsigned long key, unsigned long plain)
{
unsigned long y, z, sum, r;

sum = 0;
y = plain; z = plain;

for (r = 0; r < (ROUNDS / 2); r++) {
sum += DELTA;
y += ((z << 4) + key) ^ (z + sum) ^ ((z >> 5) + key);
z += ((y << 4) + key) ^ (y + sum) ^ ((y >> 5) + key);
}

plain = y; plain = z;
}

/* X-TEA */
void xtea(unsigned long key, unsigned long plain)
{
unsigned long y, z, sum, r;

sum = 0;
y = plain; z = plain;

for (r = 0; r < (ROUNDS / 2); r++) {
y += ((z << 4) ^ (z >> 5)) + (z ^ sum) + key[sum & 3];
sum += DELTA;
z += ((y << 4) ^ (y >> 5)) + (y ^ sum) + key[(sum >> 11) & 3];
}

plain = y; plain = z;
}

/* TEA-N1 */
void tean1(unsigned long key, unsigned long plain)
{
unsigned x, y, sum, r;

sum = 0;
u = plain + key; z = plain + key;

for (r = 0; r < (ROUNDS / 2); r++) {
y += ((z << 4) ^ (z >> 5)) + (z ^ sum) + rol(key[sum & 3], z);
sum += DELTA;
z += ((y << 4) ^ (y >> 5)) + (y ^ sum) + rol(key[(sum >> 11) & 3], y);
}

plain = y ^ key; plain = z ^ key;
}