A new parser for Rapache

Writing a parser for Apache configuration files presented many interesting challenges. Along with the third rewrite of the parser (which shouldn’t considered ‘stable’ yet, anyway) we tried to fullfill our need of having a more powerful API and managed to make the new API quite pythonic.

I thought I’ll just post a little tutorial I wrote for it, in the case someone is interested in something like that.

Warning: lots of grammatical errors down there. Late night writing :-)

Warning !!: long and boring post.

Rapache Parser

The parser is currently in RapacheCore.LineElement. A rename will happen soon.

Loading a file

In this tutorial we’ll work mostly on this file:

    ServerAlias www.example.com
    ServerAlias www.example.net
    ErrorDocument 400 /error/HTTP_BAD_REQUEST.html.var
    ErrorDocument 401 /error/HTTP_UNAUTHORIZED.html.var
    ErrorDocument 403 /error/HTTP_FORBIDDEN.html.var
    ErrorDocument 404 /error/HTTP_NOT_FOUND.html.var
    ErrorDocument 405 /error/HTTP_METHOD_NOT_ALLOWED.html.var
    ErrorDocument 408 /error/HTTP_REQUEST_TIME_OUT.html.var
    ErrorDocument 410 /error/HTTP_GONE.html.var
    ErrorDocument 411 /error/HTTP_LENGTH_REQUIRED.html.var
    ErrorDocument 412 /error/HTTP_PRECONDITION_FAILED.html.var
    ErrorDocument 413 /error/HTTP_REQUEST_ENTITY_TOO_LARGE.html.var
    ErrorDocument 414 /error/HTTP_REQUEST_URI_TOO_LARGE.html.var
    ErrorDocument 415 /error/HTTP_UNSUPPORTED_MEDIA_TYPE.html.var
    ErrorDocument 500 /error/HTTP_INTERNAL_SERVER_ERROR.html.var
    ErrorDocument 501 /error/HTTP_NOT_IMPLEMENTED.html.var
    ErrorDocument 502 /error/HTTP_BAD_GATEWAY.html.var
    ErrorDocument 503 /error/HTTP_SERVICE_UNAVAILABLE.html.var
    ErrorDocument 506 /error/HTTP_VARIANT_ALSO_VARIES.html.var
    <VirtualHost *>
        ServerName example.org
        DocumentRoot /var/www/example.org/httpdocs
        ErrorDocument 400 /error/HTTP_BAD_REQUEST.html.var
        ErrorDocument 401 /error/HTTP_UNAUTHORIZED.html.var
        ErrorDocument 403 /error/HTTP_FORBIDDEN.html.var
        ErrorDocument 404 /error/HTTP_NOT_FOUND.html.var
        ErrorDocument 666 /error/HTTP_FORBIDDEN.html.var
        ErrorDocument 666 /error/HTTP_NOT_FOUND.html.var
    </VirtualHost>

Let’s instance the parser and load the file:

>>> from RapacheCore.LineElement import Parser
>>> p = Parser()
>>> p.load ( 'tests/datafiles/errordocuments.conf' )

Basics

The parser instance:

>>> print p
<RapacheCore.LineElement.Parser object at 0x822256c>

The parser allows searching for directives and sections by its attributes. Every attribute (but .lines, .sections, .value and .opts) returns a selection object.

>>> print p.ErrorDocument
<RapacheCore.LineElement.PlainSelection object at 0x825ee2c>

It is case insensitive, by the way

>>> print p.errordocument
<RapacheCore.LineElement.PlainSelection object at 0x826faec>

2 specialized selections also exist: .lines and .sections. Everyone of these contains all the lines/sections to be found in the global scope of the loaded file.

>>> len ( p.lines )
21
>>> len ( p.sections )
1

Selections

A selection is an iterable object, which allow iteration on the group of lines/sections it rappresents.
For example p.ErrorDocument will return a selection of all the ErrorDocument directives in the global scope of the configuration file.

>>> print len (p.ErrorDocument)
17

Direct access is also allowed.

>>> print p.ErrorDocument[0]
<RapacheCore.LineElement.Line object at 0x8273aac>
>>> for line in p.ErrorDocument[0:3]: print line
<RapacheCore.LineElement.Line object at 0x82789ac>
<RapacheCore.LineElement.Line object at 0x82789cc>
<RapacheCore.LineElement.Line object at 0x827888c>

Lines

Every directive is reppresented by a Line object.

line = p.ServerAlias
>>> line.value
'www.example.net'
>>> print line.key
ServerAlias
>>> print line.opts
<RapacheCore.LineElement.Options object at 0x8273d2c>

The opts attribute treats the value as a list of sub-values separated by a space. It’s an iterable object, you can convert it easily to a list and you can set it from a list or a tuple.

>>>print list(line.opts)
['www.example.net']
>>> line.opts = "test.example.net", "beta.example.net", "www.example.net"
>>> print line.opts
<RapacheCore.LineElement.Options object at 0x827366c>
>>> print list(line.opts)
['test.example.net', 'beta.example.net', 'www.example.net']
>>> print line.value
test.example.net beta.example.net www.example.net
>>> print line.opts[0]
test.example.net

You can as well delete elements from .opts as you’d do with a normal list and so on.

Selections meet lines

Every selection object also support the Line interface. .value .key and .opts will work, and will refer to the last line in the selection (given that last line wins in apache configuration files this seems the best policy)

>>> print p.ErrorDocument.value
506 /error/HTTP_VARIANT_ALSO_VARIES.html.var

You still can access the other lines as you would with list items

>>> print p.ErrorDocument[0].value
400 /error/HTTP_BAD_REQUEST.html.var

Creating a new line is as easy as specifying a new value for a non existing directive. (if the directive already exists it will just be overwritten)

>>> len(p.lines)
21
>>> p.fakeline.value = 'sdoij'
>>> p.fakeline.value
'sdoij'
>>> len(p.lines)
22

As affirmed before, trying to create a directive that already exist will just overwrite the last existing line.

>>> len(p.lines)
22
>>> p.ServerAlias.value = "www.example.org"
>>> len(p.lines)
22

As a possible exception to the ‘whatever you do on a selection, it’ll affect the last line in that selection’ rule, deleting a selection will erase all the lines pertaining to it.

>>> len( p.ServerAlias )
2
>>> del p.ServerAlias
>>> len( p.ServerAlias )
0

To be able to delete individual lines, just specify their index:

>>> len ( p.ErrorDocument )
17
>>> p.ErrorDocument[-1].value
'506 /error/HTTP_VARIANT_ALSO_VARIES.html.var'
>>> del p.ErrorDocument[-1]
>>> len ( p.ErrorDocument )
16
>>> p.ErrorDocument[-1].value
'503 /error/HTTP_SERVICE_UNAVAILABLE.html.var'

Searching

As not every directive in Apache configuration files is meant to be unique (ErrorDocument for example), searching may be necessary.

You can search using the .search() method, specifying a list of searched options as parameters.
The search will return a Selection so quite everything valid for selections (iterating, last line wins, etc) will be valid for search result

>>> len( p.ErrorDocument.search([404]) )
1
>>> p.ErrorDocument.search([404]).value
'404 /error/HTTP_NOT_FOUND.html.var'

It’s possible to search for just the second option, just specify None as the first option

>>> p.ErrorDocument.search([None, '/error/HTTP_NOT_FOUND.html.var']).value
'404 /error/HTTP_NOT_FOUND.html.var'

You can modify the value of the searched lines easily:

>>> p.ErrorDocument.search([404]).opts = [404, '/error/NEW_ERROR.html.var']
>>> p.ErrorDocument.search([404]).value
'404 /error/NEW_ERROR.html.var'

As an exception, deleting all the found lines requires the use of the delete() method.

>>> p.ErrorDocument.search([404]).delete()
>>> len( p.ErrorDocument.search([404]) )
0

Sections

A section is a part of the config file enclosed in some <TAG></TAG>. Every directive or sub-section inside a section is not accessible from the outscope selections. (i.e.: p.ErrorDocument won’t return the entries inside a <VirtualHost>).
You can get a selection of sections in the very same way you access lines. Also, the sections behave the precise same way as the Parser class.

>>> len( p.VirtualHost )
1
>>> len( p.VirtualHost.ErrorDocument )
6

Sections also implement the Line interface, which means you that expose .key,.value and .opts attributes you can manipulate

>>> print p.VirtualHost.key
VirtualHost
>>> print p.VirtualHost.value
*

While sections are deletable the exact same ways as line, they you can’t create the same way you do with lines.
p.Directory.value = ”/var/www”, for example, would create a line “Directory /var/www” and not a full section, and that will cause Apache to complain on the next restart. That’s because the parser has no way to know that you want to create a section.

To create a section, you should use the following code:

>>> v.sections.create( 'VirtualHost',  '*:80')
>>> p.sections.create( 'VirtualHost',  '*:80')
<RapacheCore.LineElement.Section object at 0x8260f2c>
>>> p.VirtualHost.get_as_str()
'<VirtualHost *:80>\n</section>\n'

Getting/Setting the content

You can get and set the config file into the parser not only via load() but also passing in a list or string.

  • p.set_from_str( string ) : sets the content from a string
  • p.set_from_list( list ) : sets the content from a list of individual lines
  • p.get_as_str() : returns content as a string
  • p.get_as_list() : returns content as a list containing individual lines

This seems like a great time to subscribe my RSS !

6 responses to “A new parser for Rapache”

  1. Michael Noam

    I’d appreciate some more info about your parser.
    e.g. does it handle overriding a value / default values etc. or any other logical details about the parsing or is it a simple text parser ??

  2. Stefano Forenza

    I am not sure I get what you mean. Feel free to contact me via mail with a more detailed question.

  3. Michael

    I would like to check out your apache config parser. Is it available for download? Please drop me a note at my email address if so.

    Thanks!

  4. madhavi

    Hi,

    I would like to try the parser. Where can I download it from ?

    Thanks.

  5. shamantao

    Hi,
    I’m interesting to your parser too.
    Did you make it opensource ?
    Where can I test it ?

    Thanks

  6. Stefano Forenza

    @Shamantao: many asked, and never got back :-) .

    The parser is GPL3 as every other part of rapache (run apt-get source rapache), and is to be found inside the source code, in the RapacheCore/LineElement.py file. Also a bunch of unit tests are present in the tests/ directory.

    Currenly there is some mess in the tests/ directory and also some unuseful internal dependancy is still in place. Feel free to try it anyway, and be sure to drop to #rapache-devel if you have any doubt.

Leave a Reply

Bills

Don’t forget to Subscribe

Latest Activity

Posts

  • Google this is ridiculous
    Google you’re doing it wrong. Very wrong. This is utterly ridiculous. It’s a screenshot with a standard Firefox browser, in the standard screen resolution (1280×800), on Read More
  • A step back from the open source
    One month ago, I created my first Android app. While the app was a paid one, the reception has been outstanding. I’ve gotten a fair amount Read More
  • Facebook shuts down EventPress development.
    When you’re a big company, especially one that makes its money on free web services and advertising, it’s very easy to say you love open Read More
  • Why Android is laggy
    Great post from Andrew Munn explaining some key differences between Android and iPhone/iOs, especially when it comes to rendering and smoothness of animations and why Read More
  • Google Plus keeps your data as much as Facebook does
    When you delete an account from Google+, Google promises you to delete all the data associated with your G+ profile. Well, I deleted mine some time Read More