Serializing structures and graphs with LevelDB
In cyber security, especially when detecting threats such as port scan, there are many algorithms that require quick serialization of relationships among objects. Imagine a graph, where each node represents a server within some private network, while each link represents a port-to-port connection between any two of those servers. To further illustrate the idea, let us write the following code snippet in C:
Here the server has its name, which serves as its identification. Then there are two links, first one connects to another server’s port 80 and the second one to some other server’s port 8080. It does not matter if both pointers point to the same server or not. We just know that we have a list of structures called servers with their names and links. This “list” is built in real-time by incoming cyber security data, usually logs from the servers or a firewall. The result of such evaluation is a graph that represents the server-network infrastructure and changes itself only when a new connection among servers is opened or a new server is added to the network. This graph is thus relevant for any application that does security analyses using server entities within the same network.
We should thus be able to automatically export the graph and allow other applications running on any of those servers to access it and reconstruct the structures in the internal memory. It would be hard to use MMAP functionality here[1]See the previous article: https://www.pagancoder.com/2022/05/31/read-and-write-operations-for-memory-heavy-cyber-security-data-using-mmap/. Next to the portability there is the issue with pointers: You would have to replace all the links (struct server * server_80) with a number indicating the offset in the MMAPed memory, which may also become slow, when reading and adding the links. We need to move the time-heavy operations of “getting” and “going to” proper offsets of the memory away. And by “away” I mean only when doing exports/synchronization.
In this case, it does not matter if we use simple malloc or MMAP as long as we are using pointers for building and accessing the server graph. However, when a new link or a new server is added, we want to run the export function that has to somehow transform the graph into serializable structures (i. e. remove pointers). Let us take a look at the following snippet:
We first iterate through the list of servers and for each server structure we allocate memory for the server_transformed structure. It is easy to copy the name via native strcpy function. Then, for each pointer we create an ID using some sequential increment counter that starts from zero, and just remember which integer from the counter corresponds to which pointer. The address in server_1->server_22_80_id corresponds to the number 0 in the server_80_8080_id attribute of server_transformed_1, the address in server_3->server_22_80_id is number 1 and so on. If the address in server_4->server_22_80_id is the same as in server_3->server_22_80_id, they will both be replaced by the same number 1 in the transformed server structure.
As you can see, attributes in server_transformed are serializable to bytes (char is one byte, uint64_t is eight bytes) and thus storable in any kind of databases and files. Let us quickly look at the fast LevelDB database from Google, which you can find on GitHub including library installation steps: https://github.com/google/leveldb
There is no service needed to be installed, the LevelDB database is synchronized purely in files and works in the way of storing key-value pairs, which is exactly what we need for our use case. The key is the name of the structure instance (or even the content of the name attribute), while the value is the serialized bytes of the corresponding server_transformed structure. First, let us open the database using a database_path, which is a string to some file location (like ./my_port_sweep_graph):
Then, for each of the created server_transformed structures we call the put operation, which writes the content of the structure into the database. Since server_transformed is directly convertible to bytes, we can pass it as it is. If you allocated the server_transformed using malloc, do not forget to free the memory after the successful store, since LevelDB uses its own copy of the structure:
That is it! When you do all the steps described above, you should find LevelDB files in the database_path folder that you have specified. These files can then be copied to other machines within the network and imported to relevant security applications. The import is a reverse operation to the export: you call get from LevelDB files to obtain the server_transformed structures, which you will then use to create the actual server structures, while replacing the integer IDs with pointers to the other created server structures. The get operation is very simple:
Of course, you will have to check for NULL pointers, error messages and the size within the read_len variable, but for the purposes of this article I decided to avoid them. And again, do not forget to free the server_transformed_1 memory when the actual server structure is recreated in your code. There are many other ways you can work with graphs and their memory representation in cyber security, however, based on my experience, you will always need fast storage, only relevant simple attributes in the graph nodes and performance. Serialization serves as a good exercise to think about all these topics.
Notes
↑1 | See the previous article: https://www.pagancoder.com/2022/05/31/read-and-write-operations-for-memory-heavy-cyber-security-data-using-mmap/ |
---|