We have new signature requirements from Central Licensing.
We're happy to comply, but,...it's a bit slow. Can you figure it out?
Service: 10.0.66.121:8008 Reference Client: 172.16.18.20/sign_faster_client-d57fe60ab04625921121656d3695beca
Signfaster is a challenge from OpenCTF 2015 (Defcon 23). It's an RSA challenge that forces you to prove your understanding of RSA's stranger features. If you'd like to test your mettle, don't read any of the three solutions, the first of which is below. Download the challenge tarball and learn about a very difficult and interesting flaw in RSA.
The client is a very simple CTF TCP client that solves a proof of work (using scrypt), then transmits an RSA public key, and finally solves a challenge involving 1337 RSA signatures. The design of the challenge is very good because it tells you if you get it right without requiring you to get the speed correct. That means if you have a solution that is 2 seconds too slow but you don't know if it works, you can test it and make sure that you didn't goof. Two bugs in the challenge were: the public key size validator required public keys that were 4096 bits or more which means a key of 4094 bits won't work when it's a perfectly valid 4096-bit key (RSA is more liberal about key sizes so long as you're in the general area), and the use of twisted python library made the challenge less reliable than it should have been. Both of these were easily overcome.
The client they gave us takes about 40 seconds to compute the signature which is very slow. They want you to submit a solution in less than a second. Even with a fast implementation (which pycrypto is not), it's not easy to make one thousand, three hundred thirty-seven 4096-bit signatures fast.
The central part of our solution is in the library four1a.py.
It provides fuckedRSA, a class that is able to compute Chinese Remainder Theorem quickly with an arbitrary number of prime factors. The reason for foul naming of the class is because using this in any system is a horrible weakness and should never be used for any reason, especially not performance.
The fuckedRSA implements RSA with more than two primes. While this fundamentally breaks the security of the RSA implementation, signing operations can be optimized to be very fast. Working with more than two primes looks similar to traditional RSA with a few subtle differences:
Say you have the primes [3, 5, 7, 11, 13, 15] as your prime factors. N is 3*5*7*11*13*15 = 225225.
The decryption exponent d is computed from the encryption exponent e and phi which is a combination of the prime factors.
phi = (p-1)*(q-1)*...
In the example phi = 2*4*6*10*12*14 = 80640
Let's say that e = 31
d = inverse(e, phi)
In this case, d would be 62431, a large number compared to N.
Encryption works like this:
ciphertext = pow(plaintext, e, N)
Let's say our ciphertext is 110138.
Then the decryption occurs like:
plaintext = pow(ciphertext, d, N)
In this case plaintext = 1337.
In math notation: plaintext = (ciphertext ^ d) mod N
This is an expensive method of decrypting (or signing which is the same operation in RSA) a piece of data.
The Chinese Remainder Theorem states that you can compute the decryption with much lower exponents (and thus faster) by splitting the decryption into multiple parts and combining. Compute:
q_inv = Crypto.Util.number.inverse(q, p) m_1 = pow(ct, d_P, p) m_2 = pow(ct, d_Q, q) h = (q_inv*(m_1-m_2)) % p m = m_2 + h * q
- The d_P for each factor can be computed as d mod (p-1) where p is a factor.
- The set of d_P for our example would be [1, 3, 1, 1, 7, 5].
- q_inv is the inverse of q (the second prime) mod p.
- The first q_inv is inverse(3, 5) = 2.
In order to save time, we choose a small d and compute a valid e from it. Since e and d are inverse, you can compute e from d:
e = Crypto.Util.number.inverse(d, phi)
The problem is that you can't just pick an arbitrary d. Many inverses of d are 1, so you filter those out and get valid (d, e) pair.
The rest is just doing that repeatedly for any number of prime factors, which is not necessary but useful for the solution to the related challenge called "signfasterer".
Our solution was a modification to the provided client code. The full source file is available at faster_two.py.
In the function do_pubkey:
num_cop = 20 bits = (4096 // num_cop) + 2 ps = [Crypto.Util.number.getPrime(bits) for i in range(num_cop)] N = 1 for x in ps: N *= x #next x #print('e', e, len(hex(e)[2:]) * 4) self.v = four1a.fuckedRSA(ps, 123123123) print('e=', self.v.e, len(hex(self.v.e)[2:])*4) self.rsa = RSA.construct((long(N), long(self.v.e), long(self.v.d)))
And in the function do_challenge:
for _ in xrange(1337): #s = self.rsa.sign(s - 1337, None) s = self.v.decrypt(s - 1337)
The main trouble I ran into with this is that it took minutes for the server to respond. The authors told me that this was a problem with the design and that I should wait for it.
The winning entry took about 0.7 seconds to compute.
This solution required a lot of optimization and required hours of time despite having solved an RSA-CRT problem before. For an RSA-CRT problem that doesn't have a solution posted, check out Myster Twister C3's Smartcard RSA challenge. It's deviously difficult and doesn't allow you to solve it until you understand the concepts thoroughly.
Running the solution looks kinda like this (the original was lost in the rush to solve the fasterer challenge):
This solution earned Neg9 200 points out of our total 2710 (7.4%).