XML Database Migration using ElementTree & Decorators

Posted by on May 21st, 2009.

I have been working on a website conversion recently. Part of the work involves migrating an existing database schema into a slightly different schema. Exactly why this is required is not important. In addition, I don’t really want to talk about it in case this is work I’ve created for myself out of ignorance.

I’ve been fortunate to be back working with one my favourite langugages—- Python. However, things have changed with Python since I was in my py-programming peak; both in terms of language features and some of the standard libraries. I’m also happy to add that all of the changes seem to be for the better.

My starting point was a mySQL database. I extracted the data using a simple mysqldump with the XML feature turned on:

$ mysqldump -X ---user=<db_user> --password <database_name> > dump.xml

The result was a massive XML file. That is actually a remarkable side-note. Emacs warned me about loading a file this large, and took a bit of time to bring it up, but afterwards, it was incredibly responsive and searchable. I wouldn’t even attempt something like this with my other favourite editor Textmate.

A typical record from the database looks something like this:

  <table_data name="user">
      <field name="user_id">59</field>
      <field name="user_name">Kent Barber</field>
      <field name="user_username">shamal</field>

The last time I did XML processing in Python, things were very different. I tended to simple SAX-based parsers back then. Nowadays, there are fancy alternative like ElementTree which make many things easier.

My end goal was to be able to write simple little functions which know how to construct the objects in my shiny new Django database:

def process_user(fields):
  u = User(name=fields['user_name'], username=fields['user_username'])

To accomplish this, I wrote a little decorator class that takes the name of a table from the database, collects all of the fields, and then calls the decorated function. I know there are other ways to accomplish this same thing that are arguably better, but I really like decorators. I like the way they make the code look and read.

class with_rows_from(object):
  def __init__(self, table_name):
    self.table = table_name
  def __call__(self, fn):
    def wrapped(records):
      for data in records.findall("//database/table_data"):
        if data.get('name') == self.table:
          for row in data.findall(".//row"):
            fields = {}
            for f in row.findall(".//field"):
              fields[f.get('name')] = f.text
    return wrapped

At the top-level, I can extract all the user objects with something like this:

from xml.etree import ElementTree as ET
records = ET.parse('dump.xml')

I love this language. Now, back to my migration. There are obviously some other complications that I’ll have to deal with, but maybe that will be another post.

Posted in Reid van Melle | Tagged in ,, | Print | Email