Keep Calm and Carry On

pwnable-kr-memcpy

0x01 Analysis of the problem

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
// compiled with : gcc -o memcpy memcpy.c -m32 -lm
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>
#include <math.h>

unsigned long long rdtsc(){
asm("rdtsc");
}

char* slow_memcpy(char* dest, const char* src, size_t len){
int i;
for (i=0; i<len; i++) {
dest[i] = src[i];
}
return dest;
}

char* fast_memcpy(char* dest, const char* src, size_t len){
size_t i;
// 64-byte block fast copy
if(len >= 64){
i = len / 64;
len &= (64-1);
while(i-- > 0){
__asm__ __volatile__ (
"movdqa (%0), %%xmm0\n"
"movdqa 16(%0), %%xmm1\n"
"movdqa 32(%0), %%xmm2\n"
"movdqa 48(%0), %%xmm3\n"
"movntps %%xmm0, (%1)\n"
"movntps %%xmm1, 16(%1)\n"
"movntps %%xmm2, 32(%1)\n"
"movntps %%xmm3, 48(%1)\n"
::"r"(src),"r"(dest):"memory");
dest += 64;
src += 64;
}
}

// byte-to-byte slow copy
if(len) slow_memcpy(dest, src, len);
return dest;
}

int main(void){

setvbuf(stdout, 0, _IONBF, 0);
setvbuf(stdin, 0, _IOLBF, 0);

printf("Hey, I have a boring assignment for CS class.. :(\n");
printf("The assignment is simple.\n");

printf("-----------------------------------------------------\n");
printf("- What is the best implementation of memcpy? -\n");
printf("- 1. implement your own slow/fast version of memcpy -\n");
printf("- 2. compare them with various size of data -\n");
printf("- 3. conclude your experiment and submit report -\n");
printf("-----------------------------------------------------\n");

printf("This time, just help me out with my experiment and get flag\n");
printf("No fancy hacking, I promise :D\n");

unsigned long long t1, t2;
int e;
char* src;
char* dest;
unsigned int low, high;
unsigned int size;
// allocate memory
char* cache1 = mmap(0, 0x4000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
char* cache2 = mmap(0, 0x4000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
src = mmap(0, 0x2000, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

size_t sizes[10];
int i=0;

// setup experiment parameters
for(e=4; e<14; e++){ // 2^13 = 8K
low = pow(2,e-1);
high = pow(2,e);
printf("specify the memcpy amount between %d ~ %d : ", low, high);
scanf("%d", &size);
if( size < low || size > high ){
printf("don't mess with the experiment.\n");
exit(0);
}
sizes[i++] = size;
}

sleep(1);
printf("ok, lets run the experiment with your configuration\n");
sleep(1);

// run experiment
for(i=0; i<10; i++){
size = sizes[i];
printf("experiment %d : memcpy with buffer size %d\n", i+1, size);
dest = malloc( size );

memcpy(cache1, cache2, 0x4000); // to eliminate cache effect
t1 = rdtsc();
slow_memcpy(dest, src, size); // byte-to-byte memcpy
t2 = rdtsc();
printf("ellapsed CPU cycles for slow_memcpy : %llu\n", t2-t1);

memcpy(cache1, cache2, 0x4000); // to eliminate cache effect
t1 = rdtsc();
fast_memcpy(dest, src, size); // block-to-block memcpy
t2 = rdtsc();
printf("ellapsed CPU cycles for fast_memcpy : %llu\n", t2-t1);
printf("\n");
}

printf("thanks for helping my experiment!\n");
printf("flag : ----- erased in this source code -----\n");
return 0;
}

The binary implements two memcpy functions, one is slow_memcpy ,and the other is fast_memcpy.There are 10 rounds experiments in function main, and only if you can run over the 10 rounds of experiments, you can get the flag by run printf("flag : ----- erased in this source code -----\n");.Now lets take a try!

  1. Enter the size we want to malloc randomly:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
specify the memcpy amount between 8 ~ 16 : 9
specify the memcpy amount between 16 ~ 32 : 17
specify the memcpy amount between 32 ~ 64 : 33
specify the memcpy amount between 64 ~ 128 : 65
specify the memcpy amount between 128 ~ 256 : 129
specify the memcpy amount between 256 ~ 512 : 257
specify the memcpy amount between 512 ~ 1024 : 513
specify the memcpy amount between 1024 ~ 2048 : 1025
specify the memcpy amount between 2048 ~ 4096 : 2049
specify the memcpy amount between 4096 ~ 8192 : 4097
ok, lets run the experiment with your configuration
...
experiment 5 : memcpy with buffer size 129
ellapsed CPU cycles for slow_memcpy : 900
Segmentation fault (core dumped)

As we can see, a segmentation fault occurs in experiment 5.Now we have to find out why there is a segmentation fault when we malloc 129 bytes using fast_memcpy.So, I use gdb to find out what instruction causes this fault.I enter the size the same as above to malloc and when the segmentation fault occurs the instructions are as below:

1
2
3
4
5
6
7
8
   0x80487bd <fast_memcpy+37>:	movdqa xmm1,XMMWORD PTR [eax+0x10]
0x80487c2 <fast_memcpy+42>: movdqa xmm2,XMMWORD PTR [eax+0x20]
0x80487c7 <fast_memcpy+47>: movdqa xmm3,XMMWORD PTR [eax+0x30]
=> 0x80487cc <fast_memcpy+52>: movntps XMMWORD PTR [edx],xmm0
0x80487cf <fast_memcpy+55>: movntps XMMWORD PTR [edx+0x10],xmm1
0x80487d3 <fast_memcpy+59>: movntps XMMWORD PTR [edx+0x20],xmm2
0x80487d7 <fast_memcpy+63>: movntps XMMWORD PTR [edx+0x30],xmm3
0x80487db <fast_memcpy+67>: add DWORD PTR [ebp+0x8],0x40

movntps XMMWORD PTR [edx],xmm0, this instruction always using like mov, but the memory operand need to be aligned on a 16-byte boundary.But in experiment 5, 129 is not aligned on a 16-byte boundary.

0x02 malloc function

When the client wants to malloc a size of x bytes, the ptmalloc return x + 4 and set it to align on a 8-byte boundary.Because the pre-size field of the next chunk can be used for this chunk when this chunk is not free(why +4), the total size return to the client is 8 * (int((x + 4) / 8) + 1).

0x03 solution

  1. Test whether a the size malloced is aligned on 16-byte boundary before enter into the process memcpy :
1
2
3
4
5
6
while True:
a = int(raw_input())
if (a + 4) % 16 > 9 or (a + 4) % 16 == 0:
print 'yes'
else:
print 'no'
  1. Enter the number gets output yes(run nc 0 9022 on pwnable.kr):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
This time, just help me out with my experiment and get flag
No fancy hacking, I promise :D
specify the memcpy amount between 8 ~ 16 : 9
specify the memcpy amount between 16 ~ 32 : 22
specify the memcpy amount between 32 ~ 64 : 38
specify the memcpy amount between 64 ~ 128 : 70
specify the memcpy amount between 128 ~ 256 : 182
specify the memcpy amount between 256 ~ 512 : 300
specify the memcpy amount between 512 ~ 1024 : 600
specify the memcpy amount between 1024 ~ 2048 : 1100
specify the memcpy amount between 2048 ~ 4096 : 2200
specify the memcpy amount between 4096 ~ 8192 : 4100
ok, lets run the experiment with your configuration
experiment 1 : memcpy with buffer size 9
ellapsed CPU cycles for slow_memcpy : 2074
ellapsed CPU cycles for fast_memcpy : 238

experiment 2 : memcpy with buffer size 22
ellapsed CPU cycles for slow_memcpy : 266
ellapsed CPU cycles for fast_memcpy : 348

experiment 3 : memcpy with buffer size 38
ellapsed CPU cycles for slow_memcpy : 374
ellapsed CPU cycles for fast_memcpy : 390

experiment 4 : memcpy with buffer size 70
ellapsed CPU cycles for slow_memcpy : 586
ellapsed CPU cycles for fast_memcpy : 212

experiment 5 : memcpy with buffer size 182
ellapsed CPU cycles for slow_memcpy : 1400
ellapsed CPU cycles for fast_memcpy : 578

experiment 6 : memcpy with buffer size 300
ellapsed CPU cycles for slow_memcpy : 2174
ellapsed CPU cycles for fast_memcpy : 552

experiment 7 : memcpy with buffer size 600
ellapsed CPU cycles for slow_memcpy : 4214
ellapsed CPU cycles for fast_memcpy : 788

experiment 8 : memcpy with buffer size 1100
ellapsed CPU cycles for slow_memcpy : 7640
ellapsed CPU cycles for fast_memcpy : 598

experiment 9 : memcpy with buffer size 2200
ellapsed CPU cycles for slow_memcpy : 15046
ellapsed CPU cycles for fast_memcpy : 1386

experiment 10 : memcpy with buffer size 4100
ellapsed CPU cycles for slow_memcpy : 30102
ellapsed CPU cycles for fast_memcpy : 1604

thanks for helping my experiment!
flag : 1_w4nn4_br34K_th3_m3m0ry_4lignm3nt