T. Brian Jones
T. Brian Jones

Reputation: 13552

Processing structured data into a database from a giant text file using PHP?

I have text files containing structured data (it is a proprietary format and not something simple or common like CSV). I'm trying to put this data into a database. The text files are as large as 50GB so it's impossible for me to read the entire file into memory, extract it into an array, then process it into the database.

The text files are structured in such a way that data on a particular "item" (a specific id in the database) can have multiple lines (new lines) of information in the text file. Items in the text file always start with a line that begins with '01' and can have an infinite number of additional lines (all one after the other), that will all start with 02 or 03 ... up to 08. A new item begins when a new line starts with 01.

01some_data_about_the_first_item
02some_more_data_about_the_first_item
05more_data_about_the_first_item
01the_first_line_of_the_second_item

I'd like to use PHP to process this data.

How can I load a piece of this text file into memory where I can analyze it, get all the lines for an item, and then process it? Is there a way to load all lines up to the next line that starts with 01, process that data, then begin the next scan of the text file at the end of the last scan?

Processing the data once I've loaded pieces of it into memory is not the problem.

Upvotes: 0

Views: 385

Answers (2)

deceze
deceze

Reputation: 522626

Sure. Since you tagged the question with csv, I'll assume you have a CSV file. In that case, fgetcsv is a good function to use, which get one line from the file at a time. Using that you can get as many lines as you need for one record, then process it, then continue with the next one. Rough mockup:

$fh = fopen('file.csv', 'r');
$record = array();

do {
    $line = fgetcsv($fh);

    if ($line && $line[0] != '01') {
        // any line that does not start with 01 is part of the current record,
        // adjust condition as necessary
        $record[] = $line;
    } else if ($record) {
        /* put current $record into database */

        // start next record
        $record = array($line);
    }
} while ($line);

Upvotes: 3

stewe
stewe

Reputation: 42654

Here is a start:

<?php
$fp=fopen('big.txt','r');

while($line=fgets($fp)){
    $number=substr($line,0,2);
    $data=substr($line,2);

    // proccess each line
    echo $number.' - '.$data;
}
fclose($fp);
?>

Upvotes: 1

Related Questions