REG File Parser Using the Boost Spirit Parser Framework

Tue, Dec 1, 2009

Articles



I would like to thank the people who developed the following projects – they made the implementation of this project easier:

I want to say a personal thank you to Silviu Simen for his article “INI file reader using the Spirit library”.

There was a project in which I took part and we needed to test the working of a parser for Windows hive Registry files. These files are stored in binary representation and the structure of such a file is not documented by Microsoft. But, by means of research, my colleagues managed to clear out this structure, and after that, the question of verifying the parser work appeared.

To perform testing, I decided to use the functionality of exporting of Registry in two formats: hive and reg. Thus, I could obtain two different files for the same Registry key and after that check the working of the Windows hive Registry file parser.

The structure of the Registry file – I’ll give an example below – is very similar to the structure of the ini file, so you can use standard Windows functions for reading values in this file. But the problem is that, functions work very slow for big files, and that is why this parser was developed – a parser for reg files where I use the Boost Spirit Parser Framework. The reasons why standard Windows functions are slow will be considered below in this article.

Let’s consider the general view of a reg file structure first, and some special complicated cases will be considered as necessary.

I’ve taken the following material from here http://en.wikipedia.org/wiki/Windows_Registry.

.REG files (also known as Registration entries) are text-based human-readable files for storing portions of the Registry. On Windows 2000 and later NT-based Operating Systems, they contain the string Windows Registry Editor Version 5.00 at the beginning and are Unicode-based. On Windows 9x and NT 4.0 systems, they contain the string REGEDIT4 and are ANSI-based. Windows 9x format .REG files are compatible with Windows 2000 and later NT-based systems. The Registry Editor on Windows on these systems also supports exporting .REG files in Windows 9x/NT format. Data is stored in .REG files in the following syntax:

Example 1 (different types):

Example 2 (real):

Making a little digression, I want to stress that the number in the line ” hex(1000800c) ” is the type identifier and it can be anything. It’s often used as the data in the security branch [HKEY_LOCAL_MACHINESAM].

And now, let’s try to extend the information about the possible contents of the reg file. Here, I represent some facts obtained during our research process:

As it was mentioned, the structures of reg files and ini files are quite the same, so I started to search for methods for working with ini files. I found the standard Windows functions.

Windows gives a lot of functions to work with INI files; we are interested in two of them for our task:

So, we should call GetPrivateProfileSectionNames one time to obtain the list of keys and then call GetPrivateProfileSection to obtain the values inside the keys.

The problem is that if the file is quite big, i.e., there are a lot of keys in it, then we should call GetPrivateProfileSection several times to read from the file. Here is some row test data: file size: 30 MB, file includes about 30,000 keys, the parsing of this file takes about 20 minutes. And, I should say that reg files are often bigger than 100 MB.

Unjustified number of readings from the file.

It’s necessary to load the file to the memory at one time or in some parts and then parse its content using your own tools.

When parsing a reg file content using your our own tools, it’s good idea to use already developed work. Thus, I came across the article about the ini file reader using the Spirit library. Using the example of the ini file parser, I developed my reg file parser.

The Spirit Parser Framework is an object oriented recursive descent parser generator framework implemented using template metaprogramming techniques. Expression templates allow users to approximate the syntax of Extended Backus Naur Form (EBNF) completely in C++.

I was also attracted by these factors:

Useful links:

The main idea of using boost::spirit is in using the rules. Usually, several basic rules are defined, and then other rules are defined by means of overridden operators as a combinations of basic rules. The following example shows the creation of rules using “AND” and “NOT” operators:

This rule works for any symbol except for ‘A’ and ‘B’.

Below in this article, each rule will be described in detail, but now, I want to give some quick information about possible operators – to let you imagine what the possible operations with the rules are.

Related Posts

Related Websites

  • How to Get Help Preparing & Filing Your 2010 Income Taxes - for Free! The process of filing annual income taxes is overwhelming enough without having to worry about costly tax preparation fees and frequently changing IRS guidelines. If denial and procrastination are your coping mechanisms of choice, please be aware that filing taxes late or not at all can be far more expensive......
  • Compounding and the Rule of 72 The reason why it is so important for you to start saving early is the magic behind the concept of compounding and the rule of 72. People who wait until they are later to begin saving are going to have to save much more and much more quickly in order......
  • Review of Windows Live Writer When you find a tool that makes life easier, there is nothing more exciting. The need for corporations to simplify and systematize their processes has to do with working smart and taking advantage of things that allow workers to reach their goals without having to work quite as hard. One......
  • Google Work at Home Scam I don't know if it's because I wrote about what I considered a MonaVie scam in the past, but yesterday I had a new scam knock on my door twice. It would be understandable if it was related to juice or multi-level marketing... but it's not. And sadly, I feel......
  • More on the Windows WMF zero-day exploit There seems to be quite a bit developing on the Windows Meta File (WMF) zero-day (0-day) exploit which was first reported yesterday. Sans has raised their alert level to yellow in an effort to get attention to this problem. It looks like the original site serving the exploit is down,......
  • Netscape 7.1 download file could not be saved I've run across this a couple of times and to save me a few more Google searches.... Under Netscape 7.1, trying to download a file, there is a long message (with a temporary file address), and the following... (long path to file like c:\windows\temp\..... ) could not be saved because......
  • Roth IRA Rules for 2011 The Roth IRA rules are pretty basic to understand, yet, lots of people have trouble working within their limits. We all want to save money for our retirement, and Roth IRAs are a great alternative other than that of retirement plans sponsored by employers. Unlike in Traditional IRAs where contributions......
  • 9 Ways to Reduce Heating Costs There are a number of techniques that you can do to reduce your heating costs. Here is a list of tips and techniques to give you the ability to reduce your heating bill. 1 - Find a warm pair of socks. Sounds silly, but feet that are cold will send......
, , , , ,

Leave a Reply


PHVsPjxsaT48c3Ryb25nPndvb19hZHNfcm90YXRlPC9zdHJvbmc+IC0gdHJ1ZTwvbGk+PGxpPjxzdHJvbmc+d29vX2FkXzMwMF9hZHNlbnNlPC9zdHJvbmc+IC0gPC9saT48bGk+PHN0cm9uZz53b29fYWRfMzAwX2ltYWdlPC9zdHJvbmc+IC0gaHR0cDovL3d3dy53b290aGVtZXMuY29tL2Fkcy93b290aGVtZXMtMzAweDI1MC0yLmdpZjwvbGk+PGxpPjxzdHJvbmc+d29vX2FkXzMwMF91cmw8L3N0cm9uZz4gLSBodHRwOi8vd3d3Lndvb3RoZW1lcy5jb208L2xpPjxsaT48c3Ryb25nPndvb19hZF9pbWFnZV8xPC9zdHJvbmc+IC0gaHR0cDovL3d3dy53b290aGVtZXMuY29tL2Fkcy93b290aGVtZXMtMTI1eDEyNS0xLmdpZjwvbGk+PGxpPjxzdHJvbmc+d29vX2FkX2ltYWdlXzI8L3N0cm9uZz4gLSBodHRwOi8vd3d3Lndvb3RoZW1lcy5jb20vYWRzL3dvb3RoZW1lcy0xMjV4MTI1LTIuZ2lmPC9saT48bGk+PHN0cm9uZz53b29fYWRfaW1hZ2VfMzwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbS9hZHMvd29vdGhlbWVzLTEyNXgxMjUtMy5naWY8L2xpPjxsaT48c3Ryb25nPndvb19hZF9pbWFnZV80PC9zdHJvbmc+IC0gaHR0cDovL3d3dy53b290aGVtZXMuY29tL2Fkcy93b290aGVtZXMtMTI1eDEyNS00LmdpZjwvbGk+PGxpPjxzdHJvbmc+d29vX2FkX2ltYWdlXzU8L3N0cm9uZz4gLSBodHRwOi8vd3d3Lndvb3RoZW1lcy5jb20vYWRzL3dvb3RoZW1lcy0xMjV4MTI1LTQuZ2lmPC9saT48bGk+PHN0cm9uZz53b29fYWRfaW1hZ2VfNjwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbS9hZHMvd29vdGhlbWVzLTEyNXgxMjUtNC5naWY8L2xpPjxsaT48c3Ryb25nPndvb19hZF91cmxfMTwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbTwvbGk+PGxpPjxzdHJvbmc+d29vX2FkX3VybF8yPC9zdHJvbmc+IC0gaHR0cDovL3d3dy53b290aGVtZXMuY29tPC9saT48bGk+PHN0cm9uZz53b29fYWRfdXJsXzM8L3N0cm9uZz4gLSBodHRwOi8vd3d3Lndvb3RoZW1lcy5jb208L2xpPjxsaT48c3Ryb25nPndvb19hZF91cmxfNDwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbTwvbGk+PGxpPjxzdHJvbmc+d29vX2FkX3VybF81PC9zdHJvbmc+IC0gaHR0cDovL3d3dy53b290aGVtZXMuY29tPC9saT48bGk+PHN0cm9uZz53b29fYWRfdXJsXzY8L3N0cm9uZz4gLSBodHRwOi8vd3d3Lndvb3RoZW1lcy5jb208L2xpPjxsaT48c3Ryb25nPndvb19hbHRfc3R5bGVzaGVldDwvc3Ryb25nPiAtIDgtYmxhY2tuYmx1ZS5jc3M8L2xpPjxsaT48c3Ryb25nPndvb19hc2lkZXNfY2F0ZWdvcnk8L3N0cm9uZz4gLSBTZWxlY3QgYSBjYXRlZ29yeTo8L2xpPjxsaT48c3Ryb25nPndvb19hdXRob3I8L3N0cm9uZz4gLSBmYWxzZTwvbGk+PGxpPjxzdHJvbmc+d29vX2F1dG9faW1nPC9zdHJvbmc+IC0gZmFsc2U8L2xpPjxsaT48c3Ryb25nPndvb19jb250ZW50PC9zdHJvbmc+IC0gZmFsc2U8L2xpPjxsaT48c3Ryb25nPndvb19jb250ZW50X2ZlYXQ8L3N0cm9uZz4gLSBmYWxzZTwvbGk+PGxpPjxzdHJvbmc+d29vX2N1c3RvbV9mYXZpY29uPC9zdHJvbmc+IC0gPC9saT48bGk+PHN0cm9uZz53b29fZmVhdHVyZWRfcG9zdHM8L3N0cm9uZz4gLSBTZWxlY3QgYSBudW1iZXI6PC9saT48bGk+PHN0cm9uZz53b29fZmVhdF9pbWFnZV9oZWlnaHQ8L3N0cm9uZz4gLSAxOTU8L2xpPjxsaT48c3Ryb25nPndvb19mZWF0X2ltYWdlX3dpZHRoPC9zdHJvbmc+IC0gNTQwPC9saT48bGk+PHN0cm9uZz53b29fZmVlZGJ1cm5lcl9pZDwvc3Ryb25nPiAtIDwvbGk+PGxpPjxzdHJvbmc+d29vX2ZlZWRidXJuZXJfdXJsPC9zdHJvbmc+IC0gPC9saT48bGk+PHN0cm9uZz53b29fZ29vZ2xlX2FuYWx5dGljczwvc3Ryb25nPiAtIDwvbGk+PGxpPjxzdHJvbmc+d29vX2hvbWVfb25lX2NvbDwvc3Ryb25nPiAtIGZhbHNlPC9saT48bGk+PHN0cm9uZz53b29faW1hZ2Vfc2luZ2xlPC9zdHJvbmc+IC0gZmFsc2U8L2xpPjxsaT48c3Ryb25nPndvb19sb2dvPC9zdHJvbmc+IC0gPC9saT48bGk+PHN0cm9uZz53b29fbWFudWFsPC9zdHJvbmc+IC0gaHR0cDovL3d3dy53b290aGVtZXMuY29tL3N1cHBvcnQvdGhlbWUtZG9jdW1lbnRhdGlvbi9mcmVzaC1uZXdzLzwvbGk+PGxpPjxzdHJvbmc+d29vX3Jlc2l6ZTwvc3Ryb25nPiAtIHRydWU8L2xpPjxsaT48c3Ryb25nPndvb19zaG9ydG5hbWU8L3N0cm9uZz4gLSB3b288L2xpPjxsaT48c3Ryb25nPndvb19zaW5nbGVfaW1hZ2VfaGVpZ2h0PC9zdHJvbmc+IC0gMTAwPC9saT48bGk+PHN0cm9uZz53b29fc2luZ2xlX2ltYWdlX3dpZHRoPC9zdHJvbmc+IC0gMTAwPC9saT48bGk+PHN0cm9uZz53b29fdGFiczwvc3Ryb25nPiAtIGZhbHNlPC9saT48bGk+PHN0cm9uZz53b29fdGhlbWVuYW1lPC9zdHJvbmc+IC0gRnJlc2ggTmV3czwvbGk+PGxpPjxzdHJvbmc+d29vX3RodW1iX2ltYWdlX2hlaWdodDwvc3Ryb25nPiAtIDc1PC9saT48bGk+PHN0cm9uZz53b29fdGh1bWJfaW1hZ2Vfd2lkdGg8L3N0cm9uZz4gLSA3NTwvbGk+PGxpPjxzdHJvbmc+d29vX3ZpZGVvX2NhdGVnb3J5PC9zdHJvbmc+IC0gU2VsZWN0IGEgY2F0ZWdvcnk6PC9saT48L3VsPg==