I decided to start writing some blog about programming, and I thought it would be interested to present something which is less popular - this will be about how IT things works. In this post you will reach bottom of programming skills.
And because of this as a first topic I've chosen a "binary" programming. The following topics will be included:
- memory mapping and memory protections,
- general picture how processor works,
- small dynamic compiler,
- binary assembly guide.
Big picture
The CPU is a black box which takes stream of bytes (electric signals) and produces stream of other electric signals - the output signals changes RAM contents, what Graphic Card displays, the sound from speakers and so on.
What is most interesting the input is binary stream - there is no classes, attributes, properties, stacks, allocations, garbage collectors. There is only huge number of operations for coping, port I/O and arithmetic and few registers controlling at which position of binary stream processor is. So, everything you do in C, Java or JavaScript is in some way transformed to 
Preparation
Please ensure you have GCC installed on POSIX compatible system on Intel x86 64 bit architecture (Linux, Mac).
Please download and get familiar with Intel® 64 and IA-32 Architectures Software Developer Manuals this 3k pages long, 18MB big PDF is really helpful, but only chapters 1-4 are important rest  can be read and searched as needed.
Please find your favourite IDE or editor (for such tasks I prefer VIM).
Step 1 - execution framework
This will crate small execution environment in which code can be tested and executed, to simplify things framework will be based on memory mapping.
1:  #include <strings.h>  
2:  #include <sys/mman.h>  
3:  #include <unistd.h>  
4:    
5:  int main(int argc, char* argv[]) {  
6:      void* codePtr = mmap(0, getpagesize(), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANON, 0, 0);  
7:      if (codePtr == NULL) {  
8:          return 1;  
9:      }  
10:    
11:      char *codeBytes = (char*) codePtr;  
12:      void (*code) = codePtr;  
13:      return 0;  
14:  }  
In line 6 new memory region is attached to process virtual memory (see man mmap). It's like malloc but slightly more complicated. The first argument 0 and missing flag MAP_FIXED tells system to decide at which address memory would be mapped. From CPU point of view mmap is tool to manage mappings between process virtual memory and physical memory.
The 2nd argument is size of area - because CPU uses pages to map process virtual memory to physical the size of new are has to be multiplication of page size. On Intel x86 minimum and standard page size is 4k (4096 bytes).
The next arguments says that memory can be read & write so it behave like typical variable, but in addition it can be executed (PROT_EXEC), so CPU can be pointed to execute commands from this memory area. MAP_ANNON says that memory is privately allocated to process, and not backed by FD, as a consequence FD and offset is unneeded and set to 0.
The codeBytes and code will be used to generate & execute code.
To compile code following command can be used: gcc -o binary-increment binary-increment.c && ./binary-increment && echo $? which performs compilation, than on success executes binary and then prints status code of execution. Output should be 0.
The 2nd argument is size of area - because CPU uses pages to map process virtual memory to physical the size of new are has to be multiplication of page size. On Intel x86 minimum and standard page size is 4k (4096 bytes).
The next arguments says that memory can be read & write so it behave like typical variable, but in addition it can be executed (PROT_EXEC), so CPU can be pointed to execute commands from this memory area. MAP_ANNON says that memory is privately allocated to process, and not backed by FD, as a consequence FD and offset is unneeded and set to 0.
The codeBytes and code will be used to generate & execute code.
To compile code following command can be used: gcc -o binary-increment binary-increment.c && ./binary-increment && echo $? which performs compilation, than on success executes binary and then prints status code of execution. Output should be 0.
0 komentarze:
Post a Comment