chefweil
chefweil

Reputation: 23

Using an xml "map" to import data to MySQL via python?

I have a number of different CSV files that need to be processed hourly into different tables of a MySQL database. I am writing an ingestion engine in Python that takes an xml mapping file that looks like:

<table name = "example_table">
    <column name = 'test'>
        <source type = 'sql'>schema.table_name.column_name</source>
    </column>
    <column name = 'test_2'>
        <source type = 'csv'>column_name</source>
    </column>
</table>

And uses this to determine where to insert the data into the MySQL database. In this example:

Am I way off base here? Does this seem like a reasonable way to do it? The goal is to have one python engine and multiple xml mapping files so I can process each group of inserts effectively.

Is there a better method to go about it?

Upvotes: 1

Views: 1302

Answers (1)

Jonathan Eunice
Jonathan Eunice

Reputation: 22473

There's nothing inherently wrong with this scheme. XML is a little verbose and a bit overkill for such simple mappings, but it's highly standardized, easily edited, and will work just fine. You should be able to loop through this structure easily:

from lxml import objectify
table = objectify.fromstring(xml_source)
print "table name:", table.attrib['name']
for col in table.column:
    print "column name:", col.attrib['name']
    print "source type:", col.source.attrib['type']
    print "source col:",  col.source.text

Many developers now prefer JSON or YAML configuration files to XML. For example, if you wanted similar JSON:

{
   "table":"example_table",
   "columns":[
      {
         "name":"test",
         "source_type":"sql",
         "source_col":"schema.table_name.column_name"
      },
      {
         "name":"test_2",
         "source_type":"csv",
         "source_col":"column_name"
      }
   ]
}

Which you could also iterate through easily:

j = json.loads(json_source)
print "table name:", j['table']
for col in j['columns']:
    print "column name:", col['name']
    print "source type:", col['source_type']
    print "source col:",  col['source_col']

Whatever the specific format chosen, using data-driven recipes is a flexible way to feed your ingestion engine.

Upvotes: 1

Related Questions