Embedded files, binwalk and entropy

In one of the last entries, we discussed the structure of binary files and the software that allows them to be analyzed. If we look at the abstraction of the file, we realize that it contains reserved space of a storage medium. As a result, within this space it is possible to place other files. So we’re dealing with files within files. Information placed this way is often referred to as embedded. This is not just about data archives, although these also fit into the described field. By performing a memory dump in RAW format, we obtain a resulting file on the data carrier, which, in addition to useful information, also contains other files and much more. We also deal with embedded files in case of firmware. Unfortunately, this solution is often used by malware that hides a file or part of code inside another file.

Let’s look at a specific example of embedded files. We will use two files – first_file and second_file. To remind you why file headers matter, the analyzed files have no extensions. In this case, to check the file type in general, the easiest way is to use the file command built into any Linux distribution. Additionally, you can use the xxd command with -l switch to display, let’s say, first 512 bytes of the file in hex.

According to the Open Source file header database, a header value of 4D5A identifies DOS executables. The same information with additional details is obtained by executing the file command. The second file is a library in DLL format. During the introduction, we mentioned embedded files. So how to check the contents of the space reserved by first_file and second_file? For small files, you can use a hex editor. So let’s see what we can find using the second_file example.

Knowing the data structure of an EXE or PE file, you can read from the resource section within binary position of the first resource, which is located in the so-called freeformSection area. However, if we have just started working with binary files, finding anything will be difficult. At offset 0x1EC51 we can find a PNG image (matching header). Remembering to end the PNG files (IEND section), we can extract them.

As you can see, we have indeed succeeded in extracting the image contained within the space reserved on the storage medium for another file. To facilitate this type of work, binwalk was created. With its help, you can analyze files, disks and other types of space. In this post, we will focus only on the basics of the discussed software. The binwalk first_file and binwalk second_file commands give us required information about contents of the files.

In addition to the other information, you can see that binwalk only found 3 files inside the first_file, and the offset of the manually found image in the second_file is indeed 0x1EC51. Of course, binwalk also allows you to extract resources by known file types or by regular expression, which allows you to extract any type of file from localized offsets. However, this is a material for a separate post, because by default the binary content is unpacked without extensions. There’s only one thing left to mention, and that’s entropy.

Entropy is a measure of the randomness or density of data in a file relative to a specific bit position (offset). It is used in many fields. Thanks to it, it is possible to tell where there is data inside the binary file, or even distinguish the type of file (images) – JPEG files, due to compression, usually have greater entropy than PNG files, where repetitive patterns are compressed. Binwalk allows us to generate information on entropy in numerical form and visualize it using a graph. Just use the command binwalk -E --save file_name. In our case, it’s binwalk -E --save second_file. Entropy takes the greatest continuous value between offsets 0xC800 (51200) and 0x3CC00 (248832) – this is the area from which we extracted data. These features are generally used in the cybersecurity department, but are sometimes useful in software development and other areas.

Share

Latest articles

Emmet toolset and code generation

Software development largely consists of writing code, regardless of the language used. After some time, however, it turns out that when creating complex solutions,

Binary files processing

In general, the development team does not need to delve into the binary structure of files. However, there are cases when it is necessary

Do you want to boost your business?

Stay in touch