Week 0x04: The Buffer Overflow

To celebrate my first actual demonstration of a computer security, I’m going to try something a bit different. I’ve created a YouTube video explaining the exploit so that others who want to learn about it can do so whilst watching my video. That way, I can eliminate some confusion. Here is the video if you’d like to follow along. If not, that’s fine. I am happy to explain what is going on more in-depth throughout this post.

There are a few things to note before attempting this exploit. Yes, this is more of an old-school attack. But while it’s been known for such a long time, it is still relevant today. However, today, there are multiple features that attempt to stop buffer overflow exploits. In this tutorial, I won’t get into disabling that protection until I learn more about it, but I will give you the basics of what protection is on the system.

ASLR (Address Space Layout Randomization) is a process used in most modern-day operating system that effectively shuffles around memory when the program is loaded, making it difficult to predict the memory locations of certain values. This means that you can’t simply overwrite the return address on the stack with a set value; there is more involved (more on the stack later). You could potentially bypass this if the exploitable buffer is big enough and you’ve created a large NOP sled (more on this later as well). Then, you could bruteforce the return addresses and hope for the best!

The next type of protection is called a stack canary. The canaries check if the stack has been overwritten, and if it has, sends an exception to the program.

The last main protection against these types of overflows is called DEP and NX. Remember in my previous post that certain types of memory are marked as executable, readable, writable, or any of the above? Well, with this enabled, it makes it extremely difficult to exploit the program because your code will jump to a non-executable portion of memory, thus raising an exception.

So, for demonstration all purposes, we disable the protection.

Throughout this tutorial, I will be using a C file called exploit.c. You can look at the code below.

It’s a pretty simple file. The first program argument is passed to a function called checkFunc, which in turn, allocates a 32-byte buffer in memory known as buff, which the argument will copy into. This has potentials for errors, because we do not check to see how long the length is of the program arguments. This means that an attacker can simply use a string exceeding 32 bytes and overwrite into memory not allocated by the buffer. In this case, it happens to run into the stack.

So, how can we prove that it’s exploitable? Well, through our trusty debugger, gdb, of course!

First of all, let’s disable ASLR and stack canary protection. To disable ASLR on a linux system, use the following command.

To disable stack canary protection, and an executable stack, compile the exploitable binary using the following arguments in GCC.

Without further ado, let’s load up our binary with GDB and run some tests on it! The first command loads our executable. The second command tells the program to set it’s arguments as the letter ‘A’ printed 50 times. We are using perl to write our exploits because it makes it rather easy to automate many tasks.

Once the program does a segmentation fault, we can examine the stack using the command bt in GDB. This will show us what values were on the stack before it crashed.

What does this mean? Well, 0x41 is the byte value of the letter ‘A’. You can confirm this by looking at an ASCII table. If you looked at the address in the Segmentation Fault, you can see that the code was trying to jump to that address. What if we typed our own address? Could we jump to any address that this program owns?

Without any stack protection enabled, yes. We can jump to the 32 byte buffer with our own code loaded in there. Then, the compiler will just keep chugging along, executing arbitrary commands.

If you don’t recall assembly language from my earlier tutorials, I would take a quick look at it. Let’s write some code that we can assemble that will simply print out ‘haxx’ to the console and do an infinite loop thereafter.

The first four xor operations basically set each of those registers equal to 0. The next set of commands up until the int 0x80 command create the print function. Lastly, jmp $ creates an infinite loop. I will now explain how the CPU interrupt works to print out ‘haxx’.

First, we move 0x4 (the string length of ‘haxx’) into the edx register. The reason why we’re using dl instead of edx is because our code cannot contain the 0x00 byte, the null terminator.

Then, we push edx and 0x78786168 to the stack. The reason we push edx is because it is a null terminator. I am not sure if this is really needed, because we already specify the string length. If you convert 0x78786168 from hex to ascii, you are given the value of ‘xxah’ which is ‘haxx’ written backwards.

Next, we put the value of esp (the stack address) into ecx. This will ultimately be the register responsible for the message.

Then, we move 1 into ebx, 4 into eax, and finally call the system interrupt. By moving 1 into ebx, we tell it to write to standard output. 4 tells the system to do a sys_write.

To assemble this code, you can simply go to the website https://defuse.ca/online-x86-assembler.htm and type your assembly code in. Then, you may copy the string literal for whatever purpose you desire.

In our case, upon assembling, we’re left with a rather short string of hexadecimal characters which will be our exploit code.

We will use this code to start building our exploit next.

Let’s load up GDB again. The first thing we want to figure out is the return address we want to jump to. In our case, we can simply figure out the value of &buff.

With our ‘print &buff’ statement, we are now empowered with the return address. With the return address ready to go, we can start writing our crafty code. I will explain each section of the code.

Remember when we used perl earlier? Well, here it is again when we’re writing our actual code. The first part is called a NOP sled. We create it to make an offset in our program. Why do we need an offset? Well, the return address, when overwritten on the stack, may not be perfectly aligned. This means that instead of the computer jumping to 0xbffff670, it would jump to something like 0xf670bfff. We can fix this by adding or removing NOPs to align it.

The second part is simply our exploit code. There’s really no need for explanation there. We just simply copy and paste the bytes over from the string literal.

The last part is our return address. Notice that we write it backwards, writing the 0x70 first, then the 0xf6, etc. We copy the return address 30 times because we need to ensure that the stack has been overwritten.

Now, chances are that the code as it is won’t work. Most likely, we need to change the NOP sled so that we can align the return addresses perfectly with the stack. Also, the return address itself may change. But let’s see what happens when we run the program in GDB.

After running the program, you can see that there is a segmentation fault. Also, the return address changed to 0xbffff5e0 (you can see this by printing out &buff or by looking at the printf statement above). Let’s make accommodations to our code by subtracting 2 from our NOP sled, and reflecting the return address.

Now, let’s cross our fingers and run the program!

As you can see, our program prints out ‘haxx’, and we’re left with an infinite loop! Success!

Screen Shot 2016-02-08 at 11.29.26 AM

Let us step through the program using GDB to see what happens as it executes.

When we step through the function, we notice that the instruction pointer (eip) is now set to the location in memory of the buffer. It also has assembly instructions! We have now altered the control flow of the program.

I guess you could say that I’ve officially pwned it.

Week 0x03: Segments in Compiled Code

One of the quirks of compiled C is the various segments of data that your code can have. These segments are useful with regards to initialized data, uninitialized data, and data that can change during runtime.

There are five main segments that I’d like to cover in order to explain how each segment plays along with the other. The first segment in memory is the text section, and the last section is the stack. Everything between the heap and the stack is free memory that can be allocated.

  1. text
    • The text segment is the area that contains your compiled code. This area is read-only, and thus, prevents elementary-level exploitation.
  2. data
    • The data segment contains initialized data that is both readable and writable by the CPU. This section size is fixed at compile time.
  3. bss
    • The bss segment contains uninitialized data that will later be initialized. This helps save space for later. This section size is also known at compile time.
  4. heap
    • The heap grows towards the stack, and contains all of the dynamically allocated memory that the programmer is responsible for. As we will see later, the malloc command will allocate memory on the heap. Since the programmer is ultimately responsible for what happens on the heap, they are also responsible for freeing up that memory later on for use with other processes.
  5. stack
    • The stack grows towards the heap, and is responsible for temporary variables (popped off as soon as it’s not in the scope anymore). This is managed by the CPU, and makes it efficient and easy to use.

So, what kind of variables are used by the stack, and what kind of variables are used by the heap? As I’ve said before, the stack is responsible for local variables. The stack is faster because all the program has to do is change the stack pointer to access new variables. The heap is different in this regard, because the programmer is responsible for allocating memory and freeing up memory.

How does one use the heap? Well, C has a few interesting functions relating to memory allocation, called malloc and free. The first function allocates memory, and uses an integer parameter as the number of bytes to allocate. Free frees up the memory allocated using malloc.

But when is malloc useful? Well, as I’ve said before, both the data and bss segments have a fixed size. This means you can’t have variables that have a dynamic size in these sections. Unfortunately, you can’t simply allocate an array like you can in Java using the following syntax (if the variable n is not known at compile time).

You may, however, use malloc to create your arrays during runtime. And don’t worry, it’s actually quite simple. I’m going to create a simple program that asks for a number of test cases, and squares each test case. I’ll show both the Java syntax and the C syntax to show what differs between each solution.

Java syntax

C syntax

The outputs, as you’d expect, are exactly the same. In Java, however, you just need to know a tiny bit less about how memory works.

So now you know how to dynamically allocate memory in C. Is this useful? From a programmer’s standpoint, it is very useful. There are many cases in which a programmer needs to do this sort of thing, especially when the program requires input from the outside world. This ranges from images, to songs, to even plain text documents. Malloc makes it possible to dynamically read in all of this information without knowing their size during runtime.

So, that’s the heap, but how do all of these segments tie in together? Well, in exploitation, there are many different techniques to get rogue code running. Unfortunately for attackers, the systems that potentially bad code run on have techniques to stop exploits from happening. Operating systems use these very segments to set permissions on parts of memory. For instance, the text segment part of memory can only be read and executed, and cannot be written to. Therefor, attackers can’t make programs execute their own code unless their code is very clever (a special technique is called ROP attacks, but I’ll talk about that in a later post).

Week 0x02: Variables, Pointers and Memory

I have briefly discussed memory a bit, but I haven’t really discussed what goes on under the hood with memory. To understand a bit more of what memory does, let’s first understand variables and pointers.

I will first delve into variable sizes. When you compile a program, every single variable you wrote in your code has a specific size. The sizes are primarily determined by the compiler settings (and ultimately, the CPU). Why do we need different variable sizes? Well, the CPU must know how many bytes of information should be stored in memory. Computers can only hold a finite amount of information, unfortunately, so declaring variable sizes can help decrease memory.

One question that I asked myself in researching this topic is why we can’t just use the biggest possible variable type to hold all information. Also, many languages (like Ruby) don’t require type definitions, so how do they represent information in RAM?

To understand the first question, I will present an extreme hypothetical situation. Say we were working on a computer science problem that requires an array of integers. The problem uses recursion to exponentially increase the array’s size with new values. The problem also only requires integers ranging from 0-9. This is great and all, but as we grow with more and more elements in the array, more and more memory is taken up using Integers. Why not represent the data type in a Byte format? We can see the difference below.

An array of integers with 1,000,000 elements: 4,000,000 bytes (4 megabytes) of memory allocated. An array of bytes with 1,000,000 elements: 1,000,000 bytes (1 megabyte) of memory allocated. We’ve reduced the size of the array by four times, simply by using a different data type!

To answer the second question, there is an interesting post regarding Ruby numbers at the following link: http://rubylearning.com/satishtalim/numbers_in_ruby.html

Every variable size is compiler-specific, so you may not get the same size as I do below. But here are the sizes I got when comparing some of the common variable types. Here is a short program I wrote to print out the sizes of the variables.

Upon compiling it with gcc, and executing it, I was left with this output on my 64-bit Mac computer.

On a 32-bit computer, the output would be undoubtedly different because 32-bit computers only support a maximum of 32-bits or 4 bytes.

So now we know how many bytes are allocated in memory for certain types of variables. This will help us with our explantation of C pointers, a critical part in understanding how C compiles and runs.

Pointers are just like conventional variables. They can be assigned values, and they have a compiler-specified size. They have one main function in C: storing the address of a specified variable. This helps when we need to pass a variable to another function (without passing the value) so that we can use the variable both inside and outside of the function.

I suppose I can explain this a bit better by writing a program in C.

By running this program, the value 5 is returned. This value may be expected for some people, but those who are new to programming may be left baffled. Why is it that the variable isn’t changed in memory? Well, when this program is run, the value of five is passed by value, not by reference. This means that the memory address where five is stored will not ever reach the function multiplyByThree. How do we solve this? We can use pointers.

Note: This process should be different in C++. In C, you can only pass by value. In C++, you are able to pass by reference using the & symbol.

When executing this program, the value of 15 is returned. How does this happen, and what are those fancy asterisks in my program? Well, those asterisks create pointers. Let me explain.

The first thing we do is set the variable x to equal five. Simple enough. Next, we set the address of the variable x to equal the integer pointer ptr. Just so you know &x returns the address of x, and *ptr creates a new pointer variable.

After that, we call the function multiplyByThree, with the value of the pointer as the argument. Let’s examine that function, shall we?

You should notice that the argument to multiplyByThree is a pointer. This is pretty self explanatory.  (*x) basically does an act called dereferencing, which converts the pointer address value of x to the value in memory. Then, we multiply the value in memory by three, and exit the function.

This way, we can directly modify the contents of the variable x without worrying about it not changing. Why is this important? Well, it gives us access to fundamental knowledge about how values are stored in C. All of this knowledge will build up to the fundamental ideas of how exploitation works.

Until next week!

Minds on Physics (MOP) Gold Code Generator

Physics is one of my favorite areas of interest in mathematics (other than Computer Science of course). I’ve elected to take a high school physics course, and I’ve taken more of an interest in mathematical concepts, such as trigonometry, since then. There is one thing that intruiged the Computer Science nerd in me, a website called Minds on Physics; an online interactive quiz that asks physics-related questions. After each question missed, a significant percentage of the user’s health decreases and the student may end up retrying that specific module many times.

This is a good method to ensure that students get the concept of certain topics related to physics, but I have heard tales of students retrying modules multiple times (without even looking at the hints and help); thus encouraging many students to give up. To top it off, Minds on Physics runs on the Shockwave player, an outdated browser add-on that I’d like to see disappear from the Internet.

Don’t get me wrong, I think that MOPs are an excellent way to learn physics. I just think that they should be implemented a tiny bit better (see end of post).

Each time a student completes a specified assignment, they receive a success code. The code may be either categorized as gold or silver, depending on whether or not they answered the last question correctly. The Minds on Physics website claims that the code is encrypted (which we will later find as false) and that no two students can have the same success code.

When a student completes a MOP assignment, a success code is rewarded. The success code is an encrypted 8-letter code that is unique to the student, the teacher and the assignment. No two students will ever have the same success code.

(http://www.physicsclassroom.com/mop/usage#codes)

In the beginning of Physics class, I vowed to my peers that I will figure out how gold codes are generated. And after months of procrastination, I finally got around to doing it after a friend of mine told me that there was a MOP Android app.

You may be wondering what the significance of it being an Android app is. Well, for starters, Android apps are much simpler to decompile because there are more resources out there to do so. I don’t know a single resource available to decompile Shockwave applications, so I resorted to using the Android APK. The only downfall to this is that the Android version costs a dollar. However, cracking the algorithm is worth much more than a measly dollar to me.

As my first step, I simply downloaded the Minds on Physics app from the Google Play Store after purchasing it. Then, I extracted the APK bundle from my phone and sent the file to my computer. The tool I used to extract APKs is on the play store, and is called APK Extractor.

Once I have downloaded the APK file from my phone, I then extracted the APK archive to view the contents of it. On Windows, one can simply use the 7-zip archive manager to extract the APK. Because I am on a Mac, I used the command line to extract the archive.

After a quick search throughout the directories of the APK, I noticed something strange. In the assets directory, there is a bundle of files with the extension .livecode. Using a text editor, I assessed that the files contained very readable code and that I probably will not need to decompile the MOP app!

Screen Shot 2016-01-16 at 5.33.39 PM

By Googling LiveCode, I assessed that the scripting language is used for developers to easily make apps cross-platform. There is also a way to view LiveCode files in an interactive environment, but I decided not to download it until later.

With the files at hand, I am now able to figure out the algorithm involved in generating MOP codes. In the assets directory, I found a file called asstScriptLibrary.livecode which contains the algorithms needed to generate codes. Fortunately, the algorithm was quite easy to find by searching GoldCode within the file, which yielded a function called createGoldCode.

Now, I won’t be posting the exact algorithm that generates the gold code because of copyright reasons, but I will give a few pointers in creating your own decryption function.

  • Learn the LiveCode API. LiveCode is very different from any other scripting language that I’ve seen. For one, it reads like English. Also, indices start at 1, which is different from conventional programming languages.
  • Don’t get frustrated at all of the code for the code generation. Although the code is easy to read, it may get a bit repetitive at times.
  • Try out the LiveCode debugger. The LiveCode environment comes with a code editor which lets you step through the LiveCode functions. After being frustrated that my code generator only worked for a few student IDs, I figured out the problem by stepping through the LiveCode debugger.

Also, I mentioned that the algorithm was actually not an encryption algorithm. Encryption algorithms are reversible, but this one isn’t. Instead, I’d call this algorithm a hashing algorithm, though I could be wrong. Feel free to correct me!

Here is my final product after writing the Java code to generate success codes.

Screen Shot 2016-01-16 at 10.40.44 PM

And sure enough, when we complete the assignment KC1, we are presented with the same exact gold success code. Take that, physics classroom!

Screen Shot 2016-01-16 at 10.42.17 PM

Here are some improvements that I will suggest to physics classroom.

  • Don’t store the algorithm on the client side. This makes it easily reversible, and easier to generate codes. Instead, I would track the progress of each question on the server side, and only generate success codes when the server knows for a fact that the client has been through all of the questions.
  • Don’t store your code in plain text on the client. This makes it laughably easy to view the original, uncompiled, code that is used during runtime. Please make sure to use a language that can compile your source code, or make it so that your code isn’t easily viewable by the client.
  • Stop using Shockwave Player. Please. You’ll regret it in the future when browsers won’t support it. Instead, take a server-side approach and use some sleep HTML5 and CSS to make it look pretty. You’ll thank me later.
  • Just ditch success codes entirely. If you follow the first bullet point, and track progress on the server, why not simply make it so that teachers can log into their account and monitor the progress of each student? That way, reverse engineers like me can’t exploit the algorithm.

The lesson I learned after making this program is that there are flaws in every computer program. It just takes a little patience to find them. I also learned that generating these codes will ultimately lead me to failing physics, so I won’t be doing that.

Week 0x01: GDB and Compiled C

When I last posted, I explained some ideas in assembly language and even posted some proof of concept code. As you probably already know, when you write code in a language such as C, C++, Java, or more, the source code you type eventually gets compiled into something called machine code. Java is a special exception; source code is compiled into something called bytecode which can run on any machine that has the JRE installed. C and C++, however, compile directly into machine code that can be executed on a machine.

Why can’t C and C++ run on different types of operating systems and processors without compiling again? Firstly, processors work in different ways. This means that opcodes and registers may be different on different architectures. Compilers ultimately translate source code into machine code, and the compiler needs to know how to translate it properly.

Why does Java work, though? Well, bytecode is executed in something called the Java Virtual Machine. The Java Virtual Machine is compiled using a language such as C or C++, so ultimately, Java bytecode is mapped down to low-level opcodes at some point. It just makes it significantly easier for software developers to ship Java code without worrying about every operating system and processor being different.

I won’t go too far into the specifics of how Java works, because our hacking endeavors won’t explore Java too much. Instead, I will focus on how C and C++ code maps down to low level opcodes. We can see how this happens in realtime using GDB.

First, you need a way to compile C and C++ code on your machine. I am going to assume that you already have such an environment (and that you know C and C++). If you don’t know how to program in C or C++, I suggest you pick up a book on it. This knowledge may be crucial in order to find and exploit flaws in a program. If you don’t have the build environment, you should be able to install it on Ubuntu Linux using the following command.

To begin our understanding with the debugger, let us create a quick C++ program and fire it up in the debugger. Note that the process of creating, compiling and executing a program may vary from system to system, but the process should be fairly easy. We’ll begin by writing a simple multiplication program. The program simply has a function called mult which will take two integers and multiply them together.

Let’s save this code as multiply.c and execute it using the following commands.

Note: The -g flag tells gcc that we want to preserve instructions in the executable.

The code should output 16, and all should be well. But what does the compiled code look like? To experience the phenomenon of the compiler, we can use a tool called GDB to experience assembly code first hand. This technique will come in handy in our future security studies.

Begin by executing the following commands.

The first command simply sets the GDB debugger to use the Intel assembly language syntax. This syntax, as I’ve explained in the previous post, is easier to read than AT&T in my opinion (and it’s what I’ve learned assembly with). The second then actually opens the debugger on our program.

Once the debugger opens, you can actually do a few neat things with it. For one, you can see the assembly code that corresponds with a particular function. As an example, try looking at the assembly code for the function mult.

The output for this command may look foreign to you, even after I’ve explained some assembly code in the previous tutorial, but I’ll try to explain to the best of my ability what the lines do.

The program begins by putting the base pointer on the stack (the stack is esp). Then, the function moves the stack back on to the ebp pointer to preserve it (mov ebp, esp). It also makes it easier to grab the local variables in our function. If you’d like to read more about ebp, there’s a great guide on it here: http://practicalmalwareanalysis.com/2012/04/03/all-about-ebp/

Next, we see a mov operation that takes a value at some memory location and moves it into the eax register. By knowing our C++ code, we know that this value is going to be 4. Let’s look at the command thoroughly to make sure we understand how it gets the value four from memory.

We must understand what a DWORD PTR is first. DWORD basically indicates that the value is 32-bits long, and that’s how much should be moved into the eax register. It is short for double word.

The brackets in assembly indicate that the value in memory should be put into the register eax instead of the memory address. Remember, if we removed the brackets, the value of eax would simply be a pointer, and that won’t be ideal if we wanted to multiply certain values.

How does the computer know that the four value is going to be in ebp+8? Simple. The function knows that there are two parameters that will be sent. The function also knows that at one point, there are two additional values on the stack. What does this mean? Well, there are a total of four items on the stack that relate to this function right now. We have the value of ebp, and the two param values. Since our ebp value is not important in multiplying two values, we only need the next two values on the stack. We can figure out what numbers we need from the stack simply by multiplying the number 4 (as in 4 bytes, or 32-bits) by its position on the stack. To get the first parameter, multiply 4 by the third position on the stack to get 8 (2 is the third position, because computers start numbers at 0). To get the second, multiply 4 by three (the fourth position) to get 12.

The next command imul basically multiplies the memory address at [ebp+12] by eax. We know the value of [ebp+12] is 4, and the value of eax is 4, so the resulting value is 16 and is set to the eax register after the call is completed.

Once that instruction completes, the value of ebp is set to what is was originally with the instruction pop ebp. Then, the ret function jumps back to the function caller. In our case, the value simply gets printed out in the main function.

I won’t stress about disassembling the main function, but it’s pretty self-explanatory. Now, let’s dive in to some of the powerful functions that GDB allows us to do.

First, GDB allows us to set breakpoints and view memory at various locations in code. While still in the GDB debugger, let us run these commands.

These above commands tell the debugger to set a breakpoint at main, and then run the program until it hits that breakpoint. The program execution will pause, and then we can run the next command. The x/i function tells GDB to examine the instruction at eip. The eip register always contains the address of the next instruction. This means that GDB always can tell us the assembly instruction that will be executed next.

Here is the output of x/i $eip.

Look familiar? It seems like the value of four is being moved into ESP+4 (which will later become ESP+12 in the function call). We can call the current instruction and view the next instruction by calling the simple command below.

As we expect, the value of four will be put into the [esp] register.

Exciting, huh? Well, GDB can do a lot more fun things. It will become a very powerful tool in the future. Let’s just go over a few more cool things in GDB so that we get used to it. Remember, if I don’t touch on something, you can always use the help command in order to answer your questions.

Let’s say we wanted to view the memory at a certain address. We can do that! Let’s test this out by going to the mult function.

Alright, so the GDB tells us where in our code that we stopped, but what if we want to view the stack at this location? First, lets figure out which command is set to be executed when we do nexti.

Our output is exactly the same to what we found while disassembling the mult function earlier!

What if we wanted to view all of the bytes that are currently in ebp? GDB makes it easy. First, we want to locate the memory address that EBP is pointing to so that we can get the location of the base pointer.

The output of the command for me is the following, though yours may vary.

Now that we have the memory address, we can simply inspect the first 16 bytes at the location.

This command examines the first four bytes of the stack pointer. Here is our output.

Just as we expected, the last two bytes there equal four, which are the values we wish to multiply.

Now, I’ve explained many awesome things that the debugger can do. I suppose the only question left is what can’t the debugger do? Well, for starters, it can’t turn against the human race and build a droid army of super intelligent metallic beings… or can it?

Let’s create a recursive program. For those of you who’ve never written a recursive program before, well, you’re about to right now. Recursion is basically programming jargon for a function that calls itself. What’s the importance of this? Well, it allows us to write code that looks elegant and readable, without using conventional for loops and while loops. Allow me to present an example.

One can expect the output as familiar lyrics sung to us by our fathers on long car rides as children. My point isn’t to annoy you, however. My point is to show you another cool thing that GDB can do. The first thing we will discuss in GDB is a way to investigate the call stack. This is useful when figuring out what calls were made in a program.

Let me provide a quick example. First, let’s compile the program.

Just so you know, the -o parameter allows us to indicate where the compiled file will be saved.

Next, let’s run it in our debugger.

Our first step is to set a breakpoint in the program and then run.

Then, we can simply type continue a few times. Once you’ve typed continue a few times, we can investigate the stack.

If we were to keep executing the program and tracing the stack, we would notice the stack growing bigger and bigger each time we execute the function. If the stack grows too big, we will encounter something called a stack overflow. Many programmers encounter this issue when working with recursive functions. GDB is basically telling us which addresses to jump back to when the function call happens.

So, how is this useful? Well, it gives us the opportunity to look at the function hierarchy, which can potentially help us with exploitation.

Now that you have a simple understanding on how GDB works, I advise you to create your own programs, disassemble them, step through, and more. I also advise you on learning up on things like watchpoints. They could be useful in further tutorials as well.

Week 0x00: The Assembler

To begin learning about exploitation, we must learn about how the computer reads certain instructions. To do this, we are going to learn a language known as assembly language. Please note that not all of the information in this post may be entirely correct, and is merely information that I have learned through me learning assembly many months ago.

About a year ago, when I read my first assembly code snippet, the text intimidated me. I had no idea how to understand the snippet in front of me  — it seemed impossible. The scary part about assembly for me is the amount of code one has to write to do a simple task. One also has to be familiar with different processor-specific commands, thus there are multiple forms of assembly language.

When learning about hacking and exploitation, the same rule applies to all processors: get the computer to do something it wasn’t supposed to do. Thus, one only needs background knowledge of different exploitation techniques, as well as a way to compile code for specific processors in order to exploit it. In order to understand various exploits, we need to study the barebones of how processors work.

For these tutorials, I will mainly be using an x86 processor and will be writing in Intel syntax. If you do not have an x86  computer, an x64 processor will suffice, as 32-bit (x86) programs can run on a 64-bit (x64) computer.

Just a small side-note: There are many different processor types such as ARM, and these processors have different commands and registers. However, by learning x86, it is easy to use your exploitation knowledge on other hardware (as famously demonstrated by @smealum).

I will be using the Intel syntax for these tutorials because I personally find it easier to read than the AT&T syntax. You can, however, write your code in whatever language you find suitable.

If you do not understand assembly language, no worries. There are many different tutorials on learning ASM online. You can even download NASM to compile your own code. I won’t provide an in-depth tutorial, I just hope this post explains some key concept related to exploitation.

First, head over to the NASM website, click Download, and find the latest version of the software. Once you find the latest version, click on the folder, open the directory of the operating system that you are running on, and download the corresponding archive or installer.

The process may be different depending on your operating system, but stick with this general rule of thumb: extract the archive (or install) into a directory you can remember. If you’d like, set an environmental variable to the folder at which the NASM executable resides to convenience yourself in the future. The process of setting these variables differs per computer, but if you’re reading these tutorials, you probably have the knowledge on how to Google something.

At this point, we’re ready to start learning how to code in assembly language! Before we start, I’d like to explain how assembly code works, how it assembles, and the significance of it all.

Before Grace Hopper invented the first compiler, there had to be a way to create programs for use on computers. Compilers basically take higher-level code, such as C or C++, and translates it to code a computer can understand. This code is called machine code, and it can be difficult for humans to understand. Previously, programmers simply wrote the machine code and fed it into a computer. Since machine code is a collection of many different bytes, programmers created a language called assembly language to map specific commands to each of those bytes.

Processors have a collection of registers that hold data for a short period of time. A good analogy for this would be a variable, but there are only a limited amount of registered in a standard CPU. The 32-bit CPU derives its name from the size of each of its registers — 32 bits. This simply means that it can hold 32 1’s and 0’s in its memory, and can hold an unsigned number up to 4,294,967,295 including zero (2^32).

So, registers can hold data, but what’s the point of memory (RAM) then? Well, as I’ve said, register data is extremely short-lived. To keep a value in memory longer, a value is pushed onto something called the stack. In the Intel syntax, this value is called esp. Consider the following command.

When you call it, The value in the register eax is put in memory, and the value of esp is an address in memory.

To understand memory address, lets think about RAM as a cubby system. Each cubby has a designated number (address). Imagine whenever someone puts something in the cubby system, they choose the next available box and write down the designated number on a piece of paper. The person now knows which cubby they choose, and can later retrieve their items. This is somewhat how RAM works. Once something is pushed onto the stack pointer, the esp register contains a number that holds the memory address (cubby number) so that it can later be retrieved in memory. Whenever something else gets pushed onto the stack, The esp register simply holds the next value by storing the pushed value in the previous cubby slot. I will do a more in-depth analysis of the stack later on in this tutorial.

There is one more thing I would like to make clear that makes everything in exploitation possible. It may seem obvious, but it’s a hacker’s best tool: the fact that when run, code is put into memory. This fact means that if it’s possible to load our own compiled assembly code into a program, and get the program to jump to that code, we’ve essentially compromised the control flow of the program!

Without further ado, lets look at a snippet in Intel assembly and try understanding it.

Before we start to understand what the above text means, lets look at the bytes (in hexadecimal) that the computer will read in. Hexadecimal, like the binary system, is a number system that simplifies many different constructs in computing, namely memory addresses.

I’ve added new lines to help you map the bytes to the corresponding assembly code. Let’s explain what each line means, step by step.

Line 1: The processor moves the decimal value of 10 into the register ebx. In our code, you may see that I used 0xa instead of 10. 0xa happens to be the hexadecimal representation of the number ten. If you would like you can translate numbers from our traditional number system to hexadecimal by simply typing “convert 10 to hex” in google, or something along that query. Anyway, in short, the processor basically does this: int ebx = 10;

You may be wondering why there are five bytes corresponding to that simple command in the raw assembled code. The first byte is called an opcode, which is a way for the processor to tell what sort of command it needs to run. 0xBB happens to be a derivative of the mov command which only works with the ebx register. We can get a list of assembly language codes and their corresponding byte at the online x86 Instruction Chart. Let’s examine the next four bytes. Remember, our code is assembled for a 32-bit machine, and we need to make sure that huge numbers can be stored in the register. Therefor, we denote the next four bytes to hold numbers that are up to four bytes long. If each byte can hold 256 different possible values, and we had four bytes, we have a possibility of 256^4 as the maximum number of values supported by the processor. This number is also equivalent to 2^32, where 32 is the number of bits our processor supports (32-bit).

Therefor, the maximum integer that can be used in a raw calculation in assembly is 4,294,967,295 (2^32 – 1), as I’ve stated before. Why is it this value, and not 4,294,967,296 (2^32)? Well, remember, computers start their numbering at zero which counts as one of the possible values.

Another question: is it possible to store negative numbers this way? Well, technically, it is possible to store negative numbers in the processor — it’s just a tad bit harder. In binary, there is a concept known as the signing bit, where the first bit indicates whether or not the number is positive or negative. This significantly cuts down the number of positive numbers we can store, but gives us the opportunity to use negative numbers. I won’t go on a tangent about signed numbers, but there are many resources about negative numbers (including the one’s compliment and two’s compliment representations of those numbers).

Line 2: The next command is pretty straight forward. It simply adds the value of five into the register ebx. You may be wondering why there is only three bytes versus the five we’ve seen before. Well, each opcode is different, and I’ll try to explain it as best as I can. The opcode value is 0x83, which corresponds to the add command in assembly.

What then, is the 0xC3 byte that we see? I am not quite sure what the C value before the three means (perhaps a flag of some sort?) but I can assure you that the numerical value after that denotes which register is being manipulated.

So, the last byte is obviously the number we’re adding into the register. But what if we want a much bigger value? Consider the following code snippet.

This compiled code should look similar to what we had before, right? Incorrect. For some odd reason, the creators of the x86 processor decided that any decimal value 128 and above should have four bytes. Therefor, the above assembled command would look like this.

Notice how we gain an additional three bytes, as well as a change in our opcode. The change interests me; why couldn’t the assembly developers simply keep 0x81 add and throw away 0x83 add when they do the same thing (adding a value into the ebx register)? Perhaps this will speed up operations? I’d love to hear about your suggestions in the comments!

Line 3 – 5: Since these lines are all working together, I figured I would clump them into one category. Let me explain what this code is going to do before I explain how it works line-by-line. Basically, the code is going to execute a function called multiplyNum which we have a reference to in an executable. This is assuming that the same executable containing this assembly code has a function called multiplyNum. This is a hypothetical C-compiled function that can easily be called. This is going to be crucial in our exploitation endeavors; imagine being able to call any function with any parameters.

Before I explain the premise behind the push and pop commands, I’d like to further explain what they entail. Both commands relate to the operating system’s stack; a position in memory that the processor has access to. As I’ve said before, there is a register in the processor called esp which points to the current memory address of the stack. How does a stack work, though?

Let us consider a very simple implementation of a stack system. Consider a stack of papers that a teacher must grade. The stack of papers was formed by a class of students. Assuming that the teacher does not flip over the stack of papers, which paper will he/she grade first? The teacher will most likely grade the last piece of paper added on to the stack. This is also known as a LIFO system (last in, first out).

Another example of a LIFO stack is the back button in your Internet browser. If your click the back button in your browser, what’s the page it goes to? Of course, the browser will go to the last page visited. If you click it again, it will go to the one prior. And so on.

What application does this information have to computer science? Well, lots of things. In computers, a main premise is the fact that one function call can execute another. How does the computer know where in the code to return once it finishes with a function call? Simple: whenever it calls a function, it adds it to the call stack. Whenever it finishes the function and wants to return to the point in memory where it stopped, it will simply look on the stack for the memory address, and start reading the rest of the code from there. In a world where virtually indefinite function calls can be nested, a LIFO system is the best approach.

We need to keep this idea in consideration when we call our C-based function. Remember, the function is called multiplyNum and it will accept two parameters that are numbers which will be (behind-the-scenes) multiplied. What is the first thing we need to do? Well, we need to somehow tell the functions which parameters we wish to use.

We can do this by using a stack. We push in the parameters from last to first, because of our LIFO requirements. At first, this concept confused me because I had little knowledge about stacks. It makes much more sense to me now because I know that in multiplyNum, the first number visible on the stack is the last number pushed (which happens to be our first parameter), which actually makes much more sense in the long run.

We can use this knowledge of stacks to apply it to our simple assembly code. The pushing commands simply take the value of 0x5 and ebx and use them as parameters for the multiplyNum function. Notice how the bytes are different for both of the lines. When pushing 0x5 onto the stack, 0x6a is the opcode used. Fortunately, this ultimately means that less operations have to be done by the CPU, and thus increasing the speed of operations. push ebx translations to one opcode, 0x53, which is simply the opcode for pushing the register ebx onto the stack.

Let’s look at the call function. It simply jumps to the location in memory of the multiplyNum function, and then executes some unknown code. Let’s assume the unknown code is below.

The assembly code is going to look similar to the code below, with some instructions omitted (mainly because I need to familiarize myself with the base pointer and stack pointer a bit more, and the ret call).

This code will be in the memory location of the address at which the function multiplyNum was called. When we call this function, the current address is saved and an instruction in the multiplyNum function will eventually jump back to the memory location of our original function. The mul command above multiplies the values eax and ebx together and stores it in the eax register. This register is also known as the return register, and stores the result of the function.

Line 6: The last line in our assembly code above simply adds the value of three to our eax register that we have seen before. Remember, this is the return register from our function call.

If you were to take one thing from this short assembly explanation out today, I would focus on the stack. Remember, code is ultimately loaded into memory, and if we can load our own code into a vulnerable process, it’s pretty much compromised (assuming the processor jumps to our code). So stay optimistic in learning assembly!

A Look Behind

As part of a 2016 new-years resolution, I created a blog to post weekly about my experiences in computer security in hopes that I will learn more about exploitation and development. I will probably have a nominal fanbase, but nonetheless, it will definitely extend my knowledge (and perhaps yours) about the various vulnerable crevices of cyberspace.

I suppose there is nothing better to do than to spend a Friday evening writing a long introductory post about yourself on some random server on the Internet, so without further ado, I’d like to introduce myself. I am Christopher Cerne, currently a Junior in high school in Northern Virginia, and I love computer science. I am employed at my favorite fast food restaurant, Chick-fil-A, and the money helps keep this blog running. I just hope that none of my co-workers find this website.

There is a great feeling when you create something; a feeling that psychologists label as pride. But there is virtually no word to describe the emotion of great pleasure that comes when you first create something significant.

I remember having that emotion as early as the fifth grade. Ever since I could remember, I have always been interested in the way technology worked. I am very fortunate that at a young age, I was privileged with Internet access and my parents always supported my passions. This helped me, when I was in fifth grade, to learn my first ever computer language — HTML; possibly the easiest markup language to learn.

My technological endeavors began when I checked out a book about HTML at my local library branch. That menial task completely revolutionized the way I view technology todays and caused me to create my first web page. Of course, it was no Facebook, but it still gave me that feeling. I even figured out how to send my work to my family and friends by compressing it and sending via email.

Sooner or later, I bought my own web development bible dubbed HTML, XHTML and CSS For Dummies written by Andy Harris. The book encouraged me to learn PHP, my first programming language, and buy my own domain name. Learning PHP taught me the importance of Object Oriented Programming and eventually led me to Java development, and shortly thereafter, C and C++.

Why did ten-year-old me take a sudden interest in web development? Well, I simply have an inkling to learn about how things work. Everything from how Legos were developed in factories to questions about my own existence. I have always wanted to learn about how computers work, so I thought that the logical step forward would be to determine how people made websites. Learning about web development didn’t immediately lead me into Java programming, however.

I had become obsessed with a computer game called Minecraft in 2011, and I recall many memories playing on random servers on the Internet. I became interested in the Minecraft server protocol, and even figured out a prominent server software known as Bukkit. The software allowed developers to make their own plugins, or mods, for their Minecraft server. Soon enough, I was interested in learning Java to develop my own plugins for the software. I became so interested in Java, that I took two computer science classes in high school to extend my knowledge of computer science and Java. I am currently enrolled in AP Computer Science, actually.

How does all of this relate to computer security? Well, it may not be relevant to your background, but the snowball effect of me being interested in web development eventually got me here, writing to a bunch of web crawlers, and potentially, people.

I suppose the thing that mainly got me into computer security was Jordan Rabet (@smealum), who developed a way to run unsigned code on the Nintendo 3DS. The Nintendo 3DS is a video game console that is completely locked down; Rabet basically circumvented the restrictions Nintendo had in place to disallow games programmed by regular people. This is what inspired me to learn C and C++. Thankfully, Java helped me learn C++ as they both have Object Oriented Programming in common.

The skills that people like Rabet have are very valuable. They do not only increase programming knowledge, but the skills also open up doors for future careers. Someone who works in the field of penetration testing is on track for a six-figure annual salary. This is not what intrigues me about security, however. I am just simply intrigued by thought-provoking activities, and exploitation is a puzzle that can keep me engaged for a long time.

As I’ve said before, I’d like to make a weekly post about my learning experiences in computer security. I hope to post every Sunday (because Chick-fil-A is closed on Sunday, thankfully) and I will follow a book to keep me on track. The book is Hacking: The Art of Exploitation by Jon Ericson, and I am faithful that it will lead me in the right direction.

I won’t always post about exploitation, however. I will use this blog to post anything about computer security and my thoughts about current events. I may also write about personal events in my life. I hope to particularly discuss computer algorithms in the acclaimed book Algorithms by Robert Sedgwick and Kevin Wayne. Perhaps by studying the Algorithms book, my school’s computer programming team may finally defeat Thomas Jefferson High School (you probably aren’t familiar with TJ, but they’re really good at programming. #1 at each competition we go to. it’s annoying).

Anyway, I hope that I engage people interested in computer security. I believe that my next blog post may be about assembly language, so stay tuned!