In this chapter, we will implement a SHA256 example using "projectfpga.com" as a message to be hashed.
Step 1. Preprocessing:
1. Let’s convert “projectfpga.com” to binary:
01110000 01110010 01101111 01101010 01100101 01100011 01110100 01100110 01110000 01100111 01100001 00101110 01100011 01101111 01101101
2. Add 1 to the end of the data:
01110000 01110010 01101111 01101010 01100101 01100011 01110100 01100110 01110000 01100111 01100001 00101110 01100011 01101111 01101101 1
3. Fill in with zeros until the data becomes a multiple of 512 without the last 64 bits (in our case 448 bits):
01110000 01110010 01101111 01101010 01100101 01100011 01110100 01100110 01110000 01100111 01100001 00101110 01100011 01101111 01101101 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4. Add 64 bits to the end, where 64 bits is a big-endian integer denoting the length of the input data in binary. In our case 120, in binary — “1111000”.
01110000 01110010 01101111 01101010 01100101 01100011 01110100 01100110 01110000 01100111 01100001 00101110 01100011 01101111 01101101 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 01111000
Now we have an input that will always be divisible by 512 without remainder.
️️Step 2. Initializing hash values (h)
Let’s create 8 hash values. These are constants representing the first 32 bits of the fractional parts of the square roots of the first 8 primes: 2, 3, 5, 7, 11, 13, 17, 19.
h0 := 0x6a09e667 h1 := 0xbb67ae85 h2 := 0x3c6ef372 h3 := 0xa54ff53a h4 := 0x510e527f h5 := 0x9b05688c h6 := 0x1f83d9ab h7 := 0x5be0cd19
// initial hash values module sha256_H_0( output [255:0] H_0 ); assign H_0 = { 32'h6A09E667, 32'hBB67AE85, 32'h3C6EF372, 32'hA54FF53A, 32'h510E527F, 32'h9B05688C, 32'h1F83D9AB, 32'h5BE0CD19 }; endmodule
Step 3. Initialization of rounded constants (k)
Let’s create some more constants, this time there are 64 of them. Each value is the first 32 bits of the fractional parts of the cube roots of the first 64 primes (2–311).
0x428a2f98 0x71374491 0xb5c0fbcf 0xe9b5dba5 0x3956c25b 0x59f111f1 0x923f82a4 0xab1c5ed5 0xd807aa98 0x12835b01 0x243185be 0x550c7dc3 0x72be5d74 0x80deb1fe 0x9bdc06a7 0xc19bf174 0xe49b69c1 0xefbe4786 0x0fc19dc6 0x240ca1cc 0x2de92c6f 0x4a7484aa 0x5cb0a9dc 0x76f988da 0x983e5152 0xa831c66d 0xb00327c8 0xbf597fc7 0xc6e00bf3 0xd5a79147 0x06ca6351 0x14292967 0x27b70a85 0x2e1b2138 0x4d2c6dfc 0x53380d13 0x650a7354 0x766a0abb 0x81c2c92e 0x92722c85 0xa2bfe8a1 0xa81a664b 0xc24b8b70 0xc76c51a3 0xd192e819 0xd6990624 0xf40e3585 0x106aa070 0x19a4c116 0x1e376c08 0x2748774c 0x34b0bcb5 0x391c0cb3 0x4ed8aa4a 0x5b9cca4f 0x682e6ff3 0x748f82ee 0x78a5636f 0x84c87814 0x8cc70208 0x90befffa 0xa4506ceb 0xbef9a3f7 0xc67178f2
// a machine that delivers round constants module sha256_K_machine ( input clk, input rst, output [31:0] K ); reg [2047:0] rom_q; wire [2047:0] rom_d = { rom_q[2015:0], rom_q[2047:2016] }; assign K = rom_q[2047:2016]; always @(posedge clk) begin if (rst) begin rom_q <= { 32'h428a2f98, 32'h71374491, 32'hb5c0fbcf, 32'he9b5dba5, 32'h3956c25b, 32'h59f111f1, 32'h923f82a4, 32'hab1c5ed5, 32'hd807aa98, 32'h12835b01, 32'h243185be, 32'h550c7dc3, 32'h72be5d74, 32'h80deb1fe, 32'h9bdc06a7, 32'hc19bf174, 32'he49b69c1, 32'hefbe4786, 32'h0fc19dc6, 32'h240ca1cc, 32'h2de92c6f, 32'h4a7484aa, 32'h5cb0a9dc, 32'h76f988da, 32'h983e5152, 32'ha831c66d, 32'hb00327c8, 32'hbf597fc7, 32'hc6e00bf3, 32'hd5a79147, 32'h06ca6351, 32'h14292967, 32'h27b70a85, 32'h2e1b2138, 32'h4d2c6dfc, 32'h53380d13, 32'h650a7354, 32'h766a0abb, 32'h81c2c92e, 32'h92722c85, 32'ha2bfe8a1, 32'ha81a664b, 32'hc24b8b70, 32'hc76c51a3, 32'hd192e819, 32'hd6990624, 32'hf40e3585, 32'h106aa070, 32'h19a4c116, 32'h1e376c08, 32'h2748774c, 32'h34b0bcb5, 32'h391c0cb3, 32'h4ed8aa4a, 32'h5b9cca4f, 32'h682e6ff3, 32'h748f82ee, 32'h78a5636f, 32'h84c87814, 32'h8cc70208, 32'h90befffa, 32'ha4506ceb, 32'hbef9a3f7, 32'hc67178f2 }; end else begin rom_q <= rom_d; end end endmodule
Step 4. Main loop
The following steps will be performed for each 512-bit “chunk” of input data. Our test phrase “projectfpga.com” is pretty short, so there is only one “chunk”. At each iteration of the loop, we will change the values of the hash functions h0- h7 to get the final result.
Step 5. Create a message queue (w)
1. Copy the input from step 1 into a new array, where each record is a 32-bit word:
01110000011100100110111101101010 01100101011000110111010001100110 01110000011001110110000100101110 01100011011011110110110110000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000001111000
2. Add 48 more words, initialized to zero, to get an array w[0…63]:
01110000011100100110111101101010 01100101011000110111010001100110 01110000011001110110000100101110 01100011011011110110110110000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000001111000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 ... ... 00000000000000000000000000000000 00000000000000000000000000000000
3. Change the zero indices at the end of the array using the following algorithm:
• For i from w[16…63]:
• s0 = (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 18) xor (w[i-15] righthift 3)
module sha256_s0 ( input wire [31:0] x, output wire [31:0] s0 ); assign s0 = ({x[6:0], x[31:7]} ^ {x[17:0], x[31:18]} ^ (x >> 3)); endmodule
• s1 = (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] righthift 10)
module sha256_s1 ( input wire [31:0] x, output wire [31:0] s1 ); assign s1 = ({x[16:0], x[31:17]} ^ {x[18:0], x[31:19]} ^ (x >> 10)); endmodule
• w [i] = w[i-16] + s0 + w[i-7] + s1
// the message schedule: a machine that generates Wt values module W_machine #(parameter WORDSIZE=1) ( input clk, input [WORDSIZE*16-1:0] M, input M_valid, output [WORDSIZE-1:0] W_tm2, W_tm15, input [WORDSIZE-1:0] s1_Wtm2, s0_Wtm15, output [WORDSIZE-1:0] W ); // W(t-n) values, from the perspective of Wt_next assign W_tm2 = W_stack_q[WORDSIZE*2-1:WORDSIZE*1]; assign W_tm15 = W_stack_q[WORDSIZE*15-1:WORDSIZE*14]; wire [WORDSIZE-1:0] W_tm7 = W_stack_q[WORDSIZE*7-1:WORDSIZE*6]; wire [WORDSIZE-1:0] W_tm16 = W_stack_q[WORDSIZE*16-1:WORDSIZE*15]; // Wt_next is the next Wt to be pushed to the queue, will be consumed in 16 rounds wire [WORDSIZE-1:0] Wt_next = s1_Wtm2 + W_tm7 + s0_Wtm15 + W_tm16; reg [WORDSIZE*16-1:0] W_stack_q; wire [WORDSIZE*16-1:0] W_stack_d = {W_stack_q[WORDSIZE*15-1:0], Wt_next}; assign W = W_stack_q[WORDSIZE*16-1:WORDSIZE*15]; always @(posedge clk) begin if (M_valid) begin W_stack_q <= M; end else begin W_stack_q <= W_stack_d; end end endmodule
Let’s see how this works for w[16]:
w[1] rightrotate 7: 01101111001000000111011101101111 -> 11011110110111100100000011101110 w[1] rightrotate 18: 01101111001000000111011101101111 -> 00011101110110111101101111001000 w[1] rightshift 3: 01101111001000000111011101101111 -> 00001101111001000000111011101101 s0 = 11011110110111100100000011101110 XOR 00011101110110111101101111001000 XOR 00001101111001000000111011101101 s0 = 11001110111000011001010111001011 w[14] rightrotate 17: 00000000000000000000000000000000 -> 00000000000000000000000000000000 w[14] rightrotate19: 00000000000000000000000000000000 -> 00000000000000000000000000000000 w[14] rightshift 10: 00000000000000000000000000000000 -> 00000000000000000000000000000000 s1 = 00000000000000000000000000000000 XOR 00000000000000000000000000000000 XOR 00000000000000000000000000000000 s1 = 00000000000000000000000000000000 w[16] = w[0] + s0 + w[9] + s1 w[16] = 01101000011001010110110001101100 + 11001110111000011001010111001011 + 00000000000000000000000000000000 + 00000000000000000000000000000000 2^32w[16] = 00110111010001110000001000110111
This leaves us 64 words in our message queue ( w):
01101000011001010110110001101100 01101111001000000111011101101111 01110010011011000110010010000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000000000000 00000000000000000000000001011000 00110111010001110000001000110111 10000110110100001100000000110001 11010011101111010001000100001011 01111000001111110100011110000010 00101010100100000111110011101101 01001011001011110111110011001001 00110001111000011001010001011101 10001001001101100100100101100100 01111111011110100000011011011010 11000001011110011010100100111010 10111011111010001111011001010101 00001100000110101110001111100110 10110000111111100000110101111101 01011111011011100101010110010011 00000000100010011001101101010010 00000111111100011100101010010100 00111011010111111110010111010110 01101000011001010110001011100110 11001000010011100000101010011110 00000110101011111001101100100101 10010010111011110110010011010111 01100011111110010101111001011010 11100011000101100110011111010111 10000100001110111101111000010110 11101110111011001010100001011011 10100000010011111111001000100001 11111001000110001010110110111000 00010100101010001001001000011001 00010000100001000101001100011101 01100000100100111110000011001101 10000011000000110101111111101001 11010101101011100111100100111000 00111001001111110000010110101101 11111011010010110001101111101111 11101011011101011111111100101001 01101010001101101001010100110100 00100010111111001001110011011000 10101001011101000000110100101011 01100000110011110011100010000101 11000100101011001001100000111010 00010001010000101111110110101101 10110000101100000001110111011001 10011000111100001100001101101111 01110010000101111011100000011110 10100010110101000110011110011010 00000001000011111001100101111011 11111100000101110100111100001010 11000010110000101110101100010110
Step 6. Compression cycle
1. We initialize the variables a, b, c, d, e, f, g, hand set them equal to the current hash values, respectively. h0, h1, h2, h3, h4, h5, h6, h7... 2. Let’s start a compression cycle that will change the values of a… h. The loop looks like this:
• for i from 0 to 63
• S1 = (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
module sha256_S1 ( input wire [31:0] x, output wire [31:0] S1 ); assign S1 = ({x[5:0], x[31:6]} ^ {x[10:0], x[31:11]} ^ {x[24:0], x[31:25]}); endmodule
// Ch(x,y,z) module Ch #(parameter WORDSIZE=0) ( input wire [WORDSIZE-1:0] x, y, z, output wire [WORDSIZE-1:0] Ch ); assign Ch = ((x & y) ^ (~x & z)); endmodule
module sha256_S0 ( input wire [31:0] x, output wire [31:0] S0 ); assign S0 = ({x[1:0], x[31:2]} ^ {x[12:0], x[31:13]} ^ {x[21:0], x[31:22]}); endmodule
// Maj(x,y,z) module Maj #(parameter WORDSIZE=0) ( input wire [WORDSIZE-1:0] x, y, z, output wire [WORDSIZE-1:0] Maj ); assign Maj = (x & y) ^ (x & z) ^ (y & z); endmodule
// generalised round compression function module sha2_round #( parameter WORDSIZE=0 ) ( input [WORDSIZE-1:0] Kj, Wj, input [WORDSIZE-1:0] a_in, b_in, c_in, d_in, e_in, f_in, g_in, h_in, input [WORDSIZE-1:0] Ch_e_f_g, Maj_a_b_c, S0_a, S1_e, output [WORDSIZE-1:0] a_out, b_out, c_out, d_out, e_out, f_out, g_out, h_out ); wire [WORDSIZE-1:0] T1 = h_in + S1_e + Ch_e_f_g + Kj + Wj; wire [WORDSIZE-1:0] T2 = S0_a + Maj_a_b_c; assign a_out = T1 + T2; assign b_out = a_in; assign c_out = b_in; assign d_out = c_in; assign e_out = d_in + T1; assign f_out = e_in; assign g_out = f_in; assign h_out = g_in; endmodule
Let’s go through the first iteration. The addition is calculated modulo 2 ^ 32:
a = 0x6a09e667 = 01101010000010011110011001100111 b = 0xbb67ae85 = 10111011011001111010111010000101 c = 0x3c6ef372 = 00111100011011101111001101110010 d = 0xa54ff53a = 10100101010011111111010100111010 e = 0x510e527f = 01010001000011100101001001111111 f = 0x9b05688c = 10011011000001010110100010001100 g = 0x1f83d9ab = 00011111100000111101100110101011 h = 0x5be0cd19 = 01011011111000001100110100011001 e rightrotate 6: 01010001000011100101001001111111 -> 11111101010001000011100101001001 e rightrotate 11: 01010001000011100101001001111111 -> 01001111111010100010000111001010 e rightrotate 25: 01010001000011100101001001111111 -> 10000111001010010011111110101000 S1 = 11111101010001000011100101001001 XOR 01001111111010100010000111001010 XOR 10000111001010010011111110101000 S1 = 00110101100001110010011100101011e and f: 01010001000011100101001001111111 & 10011011000001010110100010001100 = 00010001000001000100000000001100 not e: 01010001000011100101001001111111 -> 10101110111100011010110110000000 (not e) and g: 10101110111100011010110110000000 & 00011111100000111101100110101011 = 00001110100000011000100110000000 ch = (e and f) xor ((not e) and g) = 00010001000001000100000000001100 xor 00001110100000011000100110000000 = 00011111100001011100100110001100// k[i] is the round constant // w[i] is the batch temp1 = h + S1 + ch + k[i] + w[i] temp1 = 01011011111000001100110100011001 + 00110101100001110010011100101011 + 00011111100001011100100110001100 + 1000010100010100010111110011000 + 01101000011001010110110001101100 temp1 = 01011011110111010101100111010100 a rightrotate 2: 01101010000010011110011001100111 -> 11011010100000100111100110011001 a rightrotate 13: 01101010000010011110011001100111 -> 00110011001110110101000001001111 a rightrotate 22: 01101010000010011110011001100111 -> 00100111100110011001110110101000 S0 = 11011010100000100111100110011001 XOR 00110011001110110101000001001111 XOR 00100111100110011001110110101000 S0 = 11001110001000001011010001111110 a and b: 01101010000010011110011001100111 & 10111011011001111010111010000101 = 00101010000000011010011000000101 a and c: 01101010000010011110011001100111 & 00111100011011101111001101110010 = 00101000000010001110001001100010 b and c: 10111011011001111010111010000101 & 00111100011011101111001101110010 = 00111000011001101010001000000000 maj = (a and b) xor (a and c) xor (b and c) = 00101010000000011010011000000101 xor 00101000000010001110001001100010 xor 00111000011001101010001000000000 = 00111010011011111110011001100111 temp2 = S0 + maj = 11001110001000001011010001111110 + 00111010011011111110011001100111 = 00001000100100001001101011100101h = 00011111100000111101100110101011 g = 10011011000001010110100010001100 f = 01010001000011100101001001111111 e = 10100101010011111111010100111010 + 01011011110111010101100111010100 = 00000001001011010100111100001110 d = 00111100011011101111001101110010 c = 10111011011001111010111010000101 b = 01101010000010011110011001100111 a = 01011011110111010101100111010100 + 00001000100100001001101011100101 = 01100100011011011111010010111001
All calculations are performed 63 more times, changing the variables а… h. As a result, we should get the following:
h0 = 6A09E667 = 01101010000010011110011001100111 h1 = BB67AE85 = 10111011011001111010111010000101 h2 = 3C6EF372 = 00111100011011101111001101110010 h3 = A54FF53A = 10100101010011111111010100111010 h4 = 510E527F = 01010001000011100101001001111111 h5 = 9B05688C = 10011011000001010110100010001100 h6 = 1F83D9AB = 00011111100000111101100110101011 h7 = 5BE0CD19 = 01011011111000001100110100011001 a = 4F434152 = 001001111010000110100000101010010 b = D7E58F83 = 011010111111001011000111110000011 c = 68BF5F65 = 001101000101111110101111101100101 d = 352DB6C0 = 000110101001011011011011011000000 e = 73769D64 = 001110011011101101001110101100100 f = DF4E1862 = 011011111010011100001100001100010 g = 71051E01 = 001110001000001010001111000000001 h = 870F00D0 = 010000111000011110000000011010000
// round compression function module sha256_round ( input [31:0] Kj, Wj, input [31:0] a_in, b_in, c_in, d_in, e_in, f_in, g_in, h_in, output [31:0] a_out, b_out, c_out, d_out, e_out, f_out, g_out, h_out ); wire [31:0] Ch_e_f_g, Maj_a_b_c, S0_a, S1_e; Ch #(.WORDSIZE(32)) Ch ( .x(e_in), .y(f_in), .z(g_in), .Ch(Ch_e_f_g) ); Maj #(.WORDSIZE(32)) Maj ( .x(a_in), .y(b_in), .z(c_in), .Maj(Maj_a_b_c) ); sha256_S0 S0 ( .x(a_in), .S0(S0_a) ); sha256_S1 S1 ( .x(e_in), .S1(S1_e) ); sha2_round #(.WORDSIZE(32)) sha256_round_inner ( .Kj(Kj), .Wj(Wj), .a_in(a_in), .b_in(b_in), .c_in(c_in), .d_in(d_in), .e_in(e_in), .f_in(f_in), .g_in(g_in), .h_in(h_in), .Ch_e_f_g(Ch_e_f_g), .Maj_a_b_c(Maj_a_b_c), .S0_a(S0_a), .S1_e(S1_e), .a_out(a_out), .b_out(b_out), .c_out(c_out), .d_out(d_out), .e_out(e_out), .f_out(f_out), .g_out(g_out), .h_out(h_out) ); endmodule
Step 7. Change the final values
After the compression cycle, but still, inside the main cycle, we modify the hash values by adding the corresponding variables a... to them h. As usual, all addition is done modulo 2 ^ 32.
h0 = h0 + a = 10111001010011010010011110111001 h1 = h1 + b = 10010011010011010011111000001000 h2 = h2 + c = 10100101001011100101001011010111 h3 = h3 + d = 11011010011111011010101111111010 h4 = h4 + e = 11000100100001001110111111100011 h5 = h5 + f = 01111010010100111000000011101110 h6 = h6 + g = 10010000100010001111011110101100 h7 = h7 + h = 11100010111011111100110111101001
Step 8. Get the final hash
And the last important step is putting everything together.
digest = h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7 = e254720208ff333431f723cbe00b9c1d45fc65b7ac1650151a3d8eb0cbd885a3
// block processor // NB: master *must* continue to assert H_in until we have signaled output_valid module sha256_block ( input clk, rst, input [255:0] H_in, input [511:0] M_in, input input_valid, output [255:0] H_out, output output_valid ); reg [6:0] round; wire [31:0] a_in = H_in[255:224], b_in = H_in[223:192], c_in = H_in[191:160], d_in = H_in[159:128]; wire [31:0] e_in = H_in[127:96], f_in = H_in[95:64], g_in = H_in[63:32], h_in = H_in[31:0]; reg [31:0] a_q, b_q, c_q, d_q, e_q, f_q, g_q, h_q; wire [31:0] a_d, b_d, c_d, d_d, e_d, f_d, g_d, h_d; wire [31:0] W_tm2, W_tm15, s1_Wtm2, s0_Wtm15, Wj, Kj; assign H_out = { a_in + a_q, b_in + b_q, c_in + c_q, d_in + d_q, e_in + e_q, f_in + f_q, g_in + g_q, h_in + h_q }; assign output_valid = round == 64; always @(posedge clk) begin if (input_valid) begin a_q <= a_in; b_q <= b_in; c_q <= c_in; d_q <= d_in; e_q <= e_in; f_q <= f_in; g_q <= g_in; h_q <= h_in; round <= 0; end else begin a_q <= a_d; b_q <= b_d; c_q <= c_d; d_q <= d_d; e_q <= e_d; f_q <= f_d; g_q <= g_d; h_q <= h_d; round <= round + 1; end end sha256_round sha256_round ( .Kj(Kj), .Wj(Wj), .a_in(a_q), .b_in(b_q), .c_in(c_q), .d_in(d_q), .e_in(e_q), .f_in(f_q), .g_in(g_q), .h_in(h_q), .a_out(a_d), .b_out(b_d), .c_out(c_d), .d_out(d_d), .e_out(e_d), .f_out(f_d), .g_out(g_d), .h_out(h_d) ); sha256_s0 sha256_s0 (.x(W_tm15), .s0(s0_Wtm15)); sha256_s1 sha256_s1 (.x(W_tm2), .s1(s1_Wtm2)); W_machine #(.WORDSIZE(32)) W_machine ( .clk(clk), .M(M_in), .M_valid(input_valid), .W_tm2(W_tm2), .W_tm15(W_tm15), .s1_Wtm2(s1_Wtm2), .s0_Wtm15(s0_Wtm15), .W(Wj) ); sha256_K_machine sha256_K_machine ( .clk(clk), .rst(input_valid), .K(Kj) ); endmodule
Done! We have performed every SHA-2 (SHA-256) step (without some iterations).