Using an xml "map" to import data to MySQL via python?

Question

I have a number of different CSV files that need to be processed hourly into different tables of a MySQL database. I am writing an ingestion engine in Python that takes an xml mapping file that looks like:


        schema.table_name.column_name
    
        column_name

And uses this to determine where to insert the data into the MySQL database. In this example:

find column "test" in table "example_table" and fill in data from another sql table "schema.table_name.column_name" (this is usually a primary key of another table).
find column 'test_2' and fill in data from a the csv file keyed by 'column_name'

Am I way off base here? Does this seem like a reasonable way to do it? The goal is to have one python engine and multiple xml mapping files so I can process each group of inserts effectively.

Is there a better method to go about it?

Jonathan Eunice · Accepted Answer

There's nothing inherently wrong with this scheme. XML is a little verbose and a bit overkill for such simple mappings, but it's highly standardized, easily edited, and will work just fine. You should be able to loop through this structure easily:

from lxml import objectify
table = objectify.fromstring(xml_source)
print "table name:", table.attrib['name']
for col in table.column:
    print "column name:", col.attrib['name']
    print "source type:", col.source.attrib['type']
    print "source col:",  col.source.text

Many developers now prefer JSON or YAML configuration files to XML. For example, if you wanted similar JSON:

{
   "table":"example_table",
   "columns":[
      {
         "name":"test",
         "source_type":"sql",
         "source_col":"schema.table_name.column_name"
      },
      {
         "name":"test_2",
         "source_type":"csv",
         "source_col":"column_name"
      }
   ]
}

Which you could also iterate through easily:

j = json.loads(json_source)
print "table name:", j['table']
for col in j['columns']:
    print "column name:", col['name']
    print "source type:", col['source_type']
    print "source col:",  col['source_col']

Whatever the specific format chosen, using data-driven recipes is a flexible way to feed your ingestion engine.

Using an xml "map" to import data to MySQL via python?

Answers (1)

Related Questions

Using an xml &quot;map&quot; to import data to MySQL via python?

Answers (1)

Related Questions

Using an xml "map" to import data to MySQL via python?