Clayton
Clayton

Reputation: 13

loading a text file into a database using python

I am trying to load a text file into a database. My text file is about 1.6GB. I need to write a python script to load the text file with all the headers into a database

Any guidelines on how I go about doing this?

thanks

Upvotes: 1

Views: 3635

Answers (1)

Adam Morris
Adam Morris

Reputation: 8545

Using python is certainly possible. If you're reading into mysql, you might check out mysql-python. Reading the text file you can use file = open('filename','r'), and file.readline() to get each line and parse it.

However... there is an added overhead of using python. If the text file is orderly (that is, one reacord per row, each row having the same number of columns with a consistent delimiter such as a comma, tab, semi-colon, etc), then the most efficient way is to load it directly. In mysql, you'd do this something like:

LOAD DATA INFILE '/folder/input.dat' INTO TABLE destination_table
FIELDS TERMINATED BY ',' (id,field_1,field_2,field_3);

If you need some minor modifications to the file, such as changing commas or things at the beginning or end of the line you might use a command line sed (if you're on *nix or osx... you'll have to install if you're on windows).

Update

LOAD DATA INFILE will be quickest: http://dev.mysql.com/doc/refman/5.5/en/load-data.html

When you say "start of article 1. some text 2. some text 3. some text MAINO", are 1., 2., 3. and MAINO DIFFERENT FIELDS? If you had 2 fields such as a header and article, you might format your text document to look something like:

This is my header of the article.;;And here is my article, it's a nice
story about programming.
My article goes on for many lines.
And many more.
<==============================================================================>
This is my second article header.;;And here is article 2. It is.
And is.
Etc.
<==============================================================================>

Then, you could use LOAD DATA INFILE like:

LOAD DATA INFILE '/folder/filename.txt' INTO TABLE destination_table
LINES TERMINATED BY '<==============================================================================>'
FIELDS TERMINATED BY ';;' (header,article);

Upvotes: 2

Related Questions