ben
ben

Reputation: 207

Parsing text file to sql database using python

extreme python/sql beginner here. I've looked around for some help with this but wasn't able to find exactly what I need- would really appreciate any assistance.

As the title indicates, I have a very large text file that I want to parse into a sql database preferably using python. The text file is set up as so:

#Parent field 1.1
child 1.1
child 1.1 continued

# Parent field 1.2
child 1.2

# Parent field 1.3
child 1.3 text
child 1.3 text
more child 1.3 text

...

# Parent field 1.88
child 1.88

#Parent field 2.1
child 2.1

etc...

Some key points about the list:

Now, I'd like each parent field (1.1, 1.2, 1.3 --> .88) to be a column and the rows populated by subsequent numbers (2.1, 3.1 -->100s)

Could someone help me set up a python script and give me some direction of how to begin parsing? Let me know if I haven't explained the task properly and I'll promptly provide more details.

Thanks so much!

Ben

EDIT: I just realized that the # of columns is NOT constant 88, it is variable

Upvotes: 0

Views: 5158

Answers (2)

LetMeSOThat4U
LetMeSOThat4U

Reputation: 6796

A few points:

  1. From the description it seems like you aim at your data being denormalized in one table. This is generally not a good idea. Split your data into two tables: PARENT and CHILDREN. PARENT should contain ID and CHILDREN should have at least two columns: PARENT_ID and CHILD_VALUE (or smth like it) with PARENT_ID being ID of a parent, whether linked explicitly as foreign key DB construct or not (depending on database). Then, while parsing, INSERT into table CHILDREN relevant record with VALUES("1.1", "1.1childA"), VALUES("1.1", "1.1childB") and so on.

  2. parsing should be trivial: iterate line by line and on "parent" line change parent_id and INSERT into PARENT and read child rows as it goes and INSERT those into CHILDREN table. You could also do it in two passes.

Smth like this:

#!/usr/bin/python

parent=''
child=''

for line in open('input.txt'):
        if line.find('#Parent') > -1 or line.find('# Parent') > -1:
                parent = field_extract(line) # fun where you extract parent value
                parent_id = ... # write it down or generate
                # INSERT into PARENT
        elif line:
                child = field_extract(line)
                # INSERT into CHILDREN with parent_id and child values

Although... I shudder when I see smth so primitive. I'd urge you to learn Pyparsing module, absolutely great for this kind of work.

Upvotes: 2

user1416451
user1416451

Reputation:

you should look into file handling in python.

the open() , .readlines() methods and lists will help you alot.

for example:

f = open("NAMEOFTXTFILE.TXT","r") #r for read, w for write, a for append.
cell = f.readlines() # Displays the content in a list
f.seek(0) # Just takes the cursor to the first cell (start of document)
print cell[2] # Prints the word or letter in the second cell.

then from there, you can send cell[2] with sql statements.

Upvotes: 1

Related Questions