Peter Kazazes
Peter Kazazes

Reputation: 3628

From an array, lookup a value and it's corresponding key in a text file in PHP

I have a sizeable txt file (3.5 MB) structured like so:

sweep#1 expanse#1   0.375
loftiness#1 highness#2  0.375
lockstep#1  0.25
laziness#2  0.25
treponema#1 0.25
rhizopodan#1 rhizopod#1 0.25
plumy#3 feathery#3 feathered#1  -0.125
ruffled#2 frilly#1 frilled#1    -0.125
fringed#2   -0.125
inflamed#3  -0.125
inlaid#1    -0.125

Each word is followed by a #, an integer and then its "score." There are tab breaks in between the word and score. As of right now, the textfile is loaded as a string using file_get_contents().

From an array of strings made up of individual, lower-case, character-stripped words, I need to look up each value, find its corresponding score and add it to a running total.

I imagine I would need some form of regex to first find the word, continue to the next \t and then add the integer to a running total. What's the best way of going about this?

Upvotes: 1

Views: 202

Answers (2)

benesch
benesch

Reputation: 5269

Yes, there are probably better ways of doing this. But this is so oh-so-simple:

<?php

$wordlist = file_get_contents("wordlist.txt");

//string string of invalid chars and make it lowercase
$string = "This is the best sentence ever! Winning!";
$string = strtolower($string);
$string = preg_replace('/[^\w\d_ -]/si', '', $string);
$words = explode(" ", $string);

$lines = explode("\n", $wordlist);
$scores = array();
foreach ($lines as $line) {
    $split = preg_split("/(\#|\t)/", $line); //split on # or tab
    $scores[$split[0]] = doubleval(array_pop($split));
    //split[0] (first element) contains the word
    //array_pop (last element) contains score
}

$total = 0;
foreach($words as $word) {
    if (isset($scores[$word])) $total += $scores[$word];
}

echo $total;
?>

Upvotes: 1

mario
mario

Reputation: 145482

If you just need to find a word, then it's as simple as:

preg_match("/^$word#\d+\t+(\d+\.\d+)/m", $textfile, $match);
$sum += floatval($match[1]);

^ looks for the start of a line in /m mode, and # and \t are literal separators, while \d+ matches decimals. The result group [1] will be your float number.

The $word needs escaping (preg_quote) could it potentially contain a / forward slash itself. To search multiple words in one go implode them as alternatives list $word1|$word2|$word3, add a capture group, and use preg_match_all instead.

Upvotes: 0

Related Questions