In general, the development team does not need to delve into the binary structure of files. However, there are cases when it is necessary to focus on the discussed topic. The reason for this may be, for example, the implementation of a related project. It’s not always like that. It happens that the created software is not fully optimized or the client expects faster operation. By applying design patterns and good practices, it is not always possible to achieve the desired result while maintaining the set budget. Therefore, in order to obtain better optimization, it is often necessary to enter the structure of files, which are often used as a data store.
This field is not the easiest one, but it is elementary when it comes to the topic of operating systems. To understand what you’re dealing with, you must first get to know a file in general abstraction. File’s icon together with an extension and the displayed content are certainly not its definition. Despite many years of experience, we have noticed that many people in the industry (knowing design patterns, SOLID principles and all issues related to techniques such as TDD or BDD) do not fully understand what a file is. Simply put, a file is an abstraction of the operating system that allows you to store data on an information medium. Each file defined according to a specific standard called a type has a specific header and sections for information.
Operating systems, depending on the implementation, recognize file types based on headers or extensions. The latter approach is popular in Windows operating systems. This is of course a simplification, because the extension is just a string of characters added after a dot to the file name. It has no effect on the actual file type and its contents. It only allows you to associate it with the default software. Linux systems, on the other hand, respect (mostly) file headers. This approach is much better for many reasons, but that’s a topic for another post.
To bring the structure of the files closer to you, we will use an example of a Portable Network Graphics (PNG) file with dimensions of 1024×683. To view the raw binary data, we will use the software created by Maël Hörz – HxD. You can download it from the official website of the developer. According to the official documentation, a PNG file has an eight-byte signature in the header, metadata contained in a structure called IHDR, and IDAT data segments terminated by an IEND segment. In the image below, you can see the selected segment with the caption included in the header.
At this stage, it can be admitted that without expert knowledge or documentation, it is difficult to find interesting information. The first byte, 0x89
, specifies a special bit in the most significant position (in binary encoding) and is an integral part of the header of the PNG file. The next 3 bytes with the values 0x50
, 0x4E
and 0x47
are the ASCII character string with the value "PNG"
. This is the so-called direct file signature. The next two bytes indicate line ending characters in the msdos system in the ASCII representation. 0xD
and 0xA
are, respectively, a carriage return (to the beginning of the line) and a newline character, often represented as \r
and \n
. The other two bytes represent the special End Of File (EOF) character and end of line on Unix systems. Of course, such analysis is very slow. Therefore, more advanced software was created that allows the use of file structure patterns based on official documentation. These solutions are still niche, but ImHex is one of the most popular.
ImHex immediately matches the file structure from the documentation (more specifically, based on a special language used to create file patterns). In addition to the signature in the header visible in HxD, we have a direct insight into the IHDR structure and individual IDAT segments. In addition to highlighting specific elements, ImHex also allows you to read the file type and its value cast to the decimal form. This way, you can easily check whether the actual dimensions of the image match those inside the file header. This type of software allows you to better understand the structure of the processed information and to make optimizations that are often impossible from the level of ready-made frameworks.