Recon_upx: The analysis of UPX-packed file
Aug 12 2024
File: recon_upx
In this blog post, I'll be discussing the use of IDA in performing static and dynamic analysis and unpacking a malware file. For more information related to IDA, visit https://hex-rays.com/ida-pro/. IDA example: https://www.hackers-arise.com/post/2017/06/22/Reverse-Engineering-Malware-Part-3-IDA-Pro-Introduction
First thing first, we'll use file command to get the information about the architecture of the file file ./recon_upx
It's pointed out that recon_upx is actually an ELF 64 bit file (built specifically to run on Linux/UNIX). The file is statically linked and no section header. Normally, for a basic ELF file, there are .text
, .bss
, and .data
section headers. This is to say that this ELF file is probably packed. The malware will unpack itself during run time, and then continue executing malicious code.
readelf -a ./recon_upx
also validates the packed state of recon_upx as explained above
Our task here is first to unpack the file, then to analyze the file. For unpacking, we'll use IDA. Let's first load the program in IDA:
According to readelf's output, once the program runs, EIP will first point to the entry point address 0x403858
(which is the address of start
)
Going through many instructions below, we can see `sys_write
(write to file, screen, memory, ...), sys_open
(open file or virtual memory), sys_mmap
(allocate virtual memory space), sys_mprotect
(works like VirtualProtect Windows api to change permission of that virtual memory space). It seems like the program is trying to decompress itself into its own virtual memory.
Notice that program continues by doing jmp r13
. We'll set a breakpoint here, 0x403AFC
, to observe corresponding actions.
Look at segments at this breakpoint, we can see recon_upx has 0x5000
in its virtual memory space from 0x7FFFF7FF4000
to 0x7FFFF7FF9000
containing DATA and CODE
Hex dump we have
We can presume that the program will then unpack code inside this section and write to its base segments. Continue execution we reach this point, 0x7FFFF7FF7C66
, followed by instruction at 0x408C01
, where there aresyscall exit
and retn
.
Debug until RIP reaches jmp r12
, we have segments table like this
Segments of recon_upx have been changed, we can presume the unpacking process has finished
Continue debuging we'll hit __libc_start_main
, and if we'll let the program continue running, main malicious process will be deployed
At start
, we saw recon_upx spans from 0x400000
to 0x609308
When we hit jmp r12
as shown above, we can dump memory out using the following python code
Once file is dumped, we can then try to fix file header and segments. Then we'll have complete file, which can further be analyzed by IDA (pseudocode).
This dump memory process can also be done using IDA pro and its plugin https://github.com/WPeace-HcH/ElfDumper. This plugin allows more efficient dumping and no need to worry about fixing file headers and segments
After this, we can load the dumped file in IDA again and use pseudocode to analyze the functionality of recon_upx