Joe Mornin
Joe Mornin

Reputation: 9134

Creating MySQL tables from XML data

I'm working with a set of lobbying disclosure records. The Secretary of the Senate publishes these records as XML files, which look like this:

<Filing ID="1ED696B6-B096-4591-9181-DA083921CD19" Year="2010" Received="2011-01-01T11:33:29.330" Amount="" Type="LD-203 YEAR-END REPORT" Period="Year-End (July 1 - Dec 31)">
<Registrant xmlns="" RegistrantID="8772" RegistrantName="CERIDIAN CORPORATION" Address="4524 Cheltenham Drive&#13;&#10;Bethesda, MD 20814" RegistrantCountry="USA"/>
    <Lobbyist xmlns="" LobbyistName="O'CONNELL, JAMES"/>
</Filing>

<Filing ID="179345CF-8D41-4C71-9C19-F41EB88254B5" Year="2010" Received="2011-01-01T13:48:31.543" Amount="" Type="LD-203 YEAR-END AMENDMENT" Period="Year-End (July 1 - Dec 31)">
    <Registrant xmlns="" RegistrantID="400447142" RegistrantName="Edward Merlis" Address="8202 Hunting Hill Lane&#13;&#10;McLean, VA 22102" RegistrantCountry="USA"/>
    <Lobbyist xmlns="" LobbyistName="Merlis, Edward A"/>
    <Contributions>
        <Contribution xmlns="" Contributor="Merlis, Edward A" ContributionType="FECA" Payee="DeFazio for Congress" Honoree="Cong. Peter DeFazio" Amount="250.0000" ContributionDate="2010-09-05T00:00:00"/>
        <Contribution xmlns="" Contributor="Merlis, Edward A" ContributionType="FECA" Payee="Friends of Jim Oberstar" Honoree="Cong. Jim Oberstar" Amount="1000.0000" ContributionDate="2010-09-01T00:00:00"/>
        <Contribution xmlns="" Contributor="Merlis, Edward A" ContributionType="FECA" Payee="McCaskill for Missouri 2012" Honoree="Senator Claire McCaskill" Amount="1000.0000" ContributionDate="2010-09-18T00:00:00"/>
        <Contribution xmlns="" Contributor="Merlis, Edward A" ContributionType="FECA" Payee="Mesabi Fund" Honoree="Cong. Jim Oberstar" Amount="500.0000" ContributionDate="2010-07-13T00:00:00"/>
    </Contributions>
</Filing>

As you can see, some <Filing> tags also contain <Contribution> tags, but others do not.

I see two objects here: contributors (i.e., lobbyists) and contributions (i.e., a transaction between a lobbyist and a member of Congress).

I'd like to load these records into a MySQL database. To me, the logical structure would include two tables: one for contributors (with fields for name, ID, address, etc.) and one for contributions (with amount, recipient, etc., and a relational link to the list of contributors).

My question: am I approaching this problem correctly? If so, does this data schema make sense? Finally, how am I to parse the XML to load it into the MySQL tables as I've structured them?

Upvotes: 0

Views: 4845

Answers (3)

Joe Mornin
Joe Mornin

Reputation: 9134

Solved: I'm using a Python SAX parser to process the XML file.

Upvotes: 1

Shamit Verma
Shamit Verma

Reputation: 3827

Tradiional approach for these kind of problems is to use an ETL tool.

Do you already have such tool (E.g. Informatica / Talend) in your organization?

Another approach is to write a small utility to parse these XMLs and load this data by creation of master detail relationships in MySQL.

Upvotes: 0

Ike Walker
Ike Walker

Reputation: 65527

If you are using MySQL version 5.5 you may find the LOAD XML command useful.

That being said, LOAD XML appears to be geared towards loading data into a single table for a given XML file, so it may not work for your specific files.

Upvotes: 0

Related Questions