In one of the last entries, we discussed the structure of binary files and the software that allows them to be analyzed. If we look at the abstraction of the file, we realize that it contains reserved space of a storage medium. As a result, within this space it is possible to place other files. So we’re dealing with files within files. Information placed this way is often referred to as embedded. This is not just about data archives, although these also fit into the described field. By performing a memory dump in RAW format, we obtain a resulting file on the data carrier, which, in addition to useful information, also contains other files and much more. We also deal with embedded files in case of firmware. Unfortunately, this solution is often used by malware that hides a file or part of code inside another file.
Let’s look at a specific example of embedded files. We will use two files – first_file
and second_file
. To remind you why file headers matter, the analyzed files have no extensions. In this case, to check the file type in general, the easiest way is to use the file
command built into any Linux distribution. Additionally, you can use the xxd
command with -l
switch to display, let’s say, first 512
bytes of the file in hex.
According to the Open Source file header database, a header value of 4D5A
identifies DOS executables. The same information with additional details is obtained by executing the file
command. The second file is a library in DLL format. During the introduction, we mentioned embedded files. So how to check the contents of the space reserved by first_file
and second_file
? For small files, you can use a hex editor. So let’s see what we can find using the second_file
example.
Knowing the data structure of an EXE or PE file, you can read from the resource section within binary position of the first resource, which is located in the so-called freeformSection
area. However, if we have just started working with binary files, finding anything will be difficult. At offset 0x1EC51
we can find a PNG image (matching header). Remembering to end the PNG files (IEND section), we can extract them.
As you can see, we have indeed succeeded in extracting the image contained within the space reserved on the storage medium for another file. To facilitate this type of work, binwalk
was created. With its help, you can analyze files, disks and other types of space. In this post, we will focus only on the basics of the discussed software. The binwalk first_file
and binwalk second_file
commands give us required information about contents of the files.
In addition to the other information, you can see that binwalk only found 3
files inside the first_file
, and the offset of the manually found image in the second_file
is indeed 0x1EC51
. Of course, binwalk
also allows you to extract resources by known file types or by regular expression, which allows you to extract any type of file from localized offsets. However, this is a material for a separate post, because by default the binary content is unpacked without extensions. There’s only one thing left to mention, and that’s entropy.
Entropy is a measure of the randomness or density of data in a file relative to a specific bit position (offset). It is used in many fields. Thanks to it, it is possible to tell where there is data inside the binary file, or even distinguish the type of file (images) – JPEG files, due to compression, usually have greater entropy than PNG files, where repetitive patterns are compressed. Binwalk
allows us to generate information on entropy in numerical form and visualize it using a graph. Just use the command binwalk -E --save file_name
. In our case, it’s binwalk -E --save second_file
. Entropy takes the greatest continuous value between offsets 0xC800
(51200
) and 0x3CC00
(248832
) – this is the area from which we extracted data. These features are generally used in the cybersecurity department, but are sometimes useful in software development and other areas.