Reputation: 1849
I have a file downloaded from dbpedia, with contents that look like this:
<http://dbpedia.org/resource/Selective_Draft_Law_Cases> <http://dbpedia.org/ontology/wikiPageExternalLink> <http://supreme.justia.com/cases/federal/us/245/366/> .
<http://dbpedia.org/resource/List_of_songs_recorded_by_Shakira> <http://dbpedia.org/ontology/wikiPageExternalLink> <http://www.shakira.com/> .
<http://dbpedia.org/resource/Bucharest_Symphony_Orchestra> <http://dbpedia.org/ontology/wikiPageExternalLink> <http://www.symphorchestra.ro/> .
<http://dbpedia.org/resource/Bucharest_Symphony_Orchestra> <http://dbpedia.org/ontology/wikiPageExternalLink> <http://symphorchestra.ro> .
<http://dbpedia.org/resource/Bucharest_Symphony_Orchestra> <http://dbpedia.org/ontology/wikiPageExternalLink> <http://www.youtube.com/symphorchestra> .
I need to extract the title from the first part of each line (i.e. Selective_draft_Law_Cases
in the first line, List_of_songs_etc in the second etc)) and save it in a mysql table together with the URL which is the third element in the same line, i.e.for the first line
for the second line etc.
I also need to skip the very first line in the file which has different, irrelevant information.
What's the fastest way to get this done in PHP?
Note: The file is quite a big one (over 1 GB in size, over 6 million lines).
Thanks in advance!
Upvotes: 0
Views: 347
Reputation: 1429
I am sure it can be optimized, but its a start. Try:
function insertFileToDb(){
$myFile = "myFile.txt"; //your txt file containing the data
$handle = fopen($myFile, 'r');
//Read first line, but do nothing with it
$contents = fgets($handle);
//now read the rest of the file line by line
while(!feof($handle)){
$data = fgets($handle);
//remove <> characters
$vowels = array("<", ">");
$data = str_replace($vowels, "", $data);
//remove spaces to a single space for each line
$data = preg_replace('!\s+!', ' ', $data);
/*
* Get values from array, 1st URL is $dataArr[0] and 2nd URL is $dataArr[2]
* Explode on ' ' spaces
*/
$dataArr = explode(" ", $data);
//Get last part of uri from 1st element in array
$title = $this->getLastPartOfUrl($dataArr[0]);
//Execute your sql query with $title and $dataArr[2] which is the url
INSERT INTO `table` ...
}
fclose($handle);
}
function getLastPartOfUrl($url){
$keys = parse_url($url); // parse the url
$path = explode("/", $keys['path']); // splitting the path
$last = end($path); // get the value of the last element
return $last;
}
Upvotes: 1
Reputation: 41605
You should use regular expressions and make use of preg_match function of PHP and if the file is too big (which seems to be your case), you might want to use fopen + fgets + fclose to avoid loading the whole file in memory and work line by line.
You could try to test the performance of file_get_contents for the reading of the file, but It seems this won't be the faster way in your case because of the big amount of memory needed.
Upvotes: 1