syrkull
syrkull

Reputation: 2344

Load data from flat file using php

I have a text file that serves as a database and have the following data format:

*NEW RECORD
NM = Stackoverflow
DT = 9/15/2006
DS = Overflow
DS = Stack
DS = stackoverflow.com
DS = FAQ

*NEW RECORD
NM = Google
DT = 9/4/1998
DS = G+
DS = Google
DS = Search engine
DS = Search

You get the idea..

The problem is I do not know how to load specific data from a specific record using PHP. especially, when the data is not in an array format. Do I need to convert data to array format? or is their a way that I can retrieve information from my current format?

For example, what is the equivelent code for this mysql query:

SELECT DT FROM MY_TXT WHERE DS = "Google"

Upvotes: 2

Views: 191

Answers (2)

Andre
Andre

Reputation: 385

Without validation!!

$filename = "test.txt"; // Your Filename ;-)

$t = new FlatDbSearch($filename);

var_dump($t->select('DT', 'DS = "Google"'));

class FlatDbSearch {

    protected $lines;

    public function __construct($filename) {
        $this->lines = file($filename, FILE_IGNORE_NEW_LINES);
    }

    public function select($column, $where) {
        $parts = explode("=", $where);
        $searchKey = trim(str_replace('"', '', $parts[0]));
        $searchValue = trim(str_replace('"', '', $parts[1]));
        $column = trim(str_replace('"', '', $column));

        $lines = $this->searchForward($searchKey, $searchValue);
        if (count($lines) !== 0) {
            $results = $this->searchBackward($column, $lines);
            return $results;
        }
        return array();
    }

    protected function searchBackward($column, $lines) {
        $results = array();
        foreach($lines as $key) {
            for ($i = $key; $i > -1; $i--) {
                $parts = explode("=", $this->lines[$i]);
                if ($column == trim(str_replace('"', '', $parts[0]))) {
                    $results[] = trim(str_replace('"', '', $parts[1]));
                    break;
                }
            }
        }
        return $results;
    }

    protected function searchForward($searchKey, $searchValue) {
        $result = array();
        for ($i = 0; $i < count($this->lines); $i++) {
            $parts = explode("=", $this->lines[$i]);
            if (trim(str_replace('"', '', $parts[0])) == $searchKey) {
                if (trim(str_replace('"', '', $parts[1])) == $searchValue) {
                    $result[] = $i;
                }
            }
        }
        return $result;
    }
}

Upvotes: 1

mlg
mlg

Reputation: 1511

If you're stuck with this format, you need a custom deserialisation mechanism. Here's one that works for your sample data:

<?php

date_default_timezone_set("UTC");

class Record {
    public $nm = null;
    public $dt = null;
    public $ds = [];

    function isValid() {
        return $this->nm !== null && $this->dt !== null && count($this->ds) > 0;
    }

    function isEmpty() {
        return $this->nm == null && $this->dt == null && count($this->ds) == 0;
    }
}

function deserialise($filename, $newLineSeparator = "\n") {
    $incompleteRecords = 0;
    $records = [];
    $lines = explode($newLineSeparator, file_get_contents($filename));

    if ($lines)
        $lines[] = "*NEW RECORD";

    $record = new Record();
    foreach ($lines as $line) {
        $line = trim($line);
        if ($line == "*NEW RECORD") {
            if ($record->isValid())
                $records[] = $record;
            else if (!$record->isEmpty())
                $incompleteRecords++;

            $record = new Record();
        } else if (substr($line, 0, 5) == "NM = ") {
            $record->nm = substr($line, 5);
        } else if (substr($line, 0, 5) == "DT = ") {
            $record->dt = strtotime(substr($line, 5));
        } else if (substr($line, 0, 5) == "DS = ") {
            $record->ds[] = substr($line, 5);
        }
    }

    echo "Found $incompleteRecords incomplete records.\n";

    return $records;
}

I tried it with your data and I get this output:

Found 0 incomplete records.
Array
(
    [0] => Record Object
        (
            [nm] => Stackoverflow
            [dt] => 1158278400
            [ds] => Array
                (
                    [0] => Overflow
                    [1] => Stack
                    [2] => stackoverflow.com
                    [3] => FAQ
                )

        )

    [1] => Record Object
        (
            [nm] => Google
            [dt] => 904867200
            [ds] => Array
                (
                    [0] => G+
                    [1] => Google
                    [2] => Search engine
                    [3] => Search
                )

        )

)

Is this what you want?

Some considerations

  • Loads everything in memory at once; no batching
  • Uses strtotime to parse dates into timestamps; you might want to just load them as strings (easier), or use the DateTime class. If you're using strtotime, please set the adecuate timezone first, as in the example (date_default_timezone_set).
  • Assumes that a record is invalid if no NM is set, or no DT is set, or no DS entries exist. You can modify this constraint by adapting the isValid method on the Record class.
  • No error-handling for broken format, lowercase, etc.
  • Assumes \n as the newline separator. If you have \r\n or \r, just invoke the deserialise function with them as the second parameter.

Upvotes: 4

Related Questions