Reputation: 3628
I have a sizeable txt file (3.5 MB) structured like so:
sweep#1 expanse#1 0.375
loftiness#1 highness#2 0.375
lockstep#1 0.25
laziness#2 0.25
treponema#1 0.25
rhizopodan#1 rhizopod#1 0.25
plumy#3 feathery#3 feathered#1 -0.125
ruffled#2 frilly#1 frilled#1 -0.125
fringed#2 -0.125
inflamed#3 -0.125
inlaid#1 -0.125
Each word is followed by a #
, an integer and then its "score." There are tab breaks in between the word and score. As of right now, the textfile is loaded as a string using file_get_contents()
.
From an array of strings made up of individual, lower-case, character-stripped words, I need to look up each value, find its corresponding score and add it to a running total.
I imagine I would need some form of regex to first find the word, continue to the next \t
and then add the integer to a running total. What's the best way of going about this?
Upvotes: 1
Views: 202
Reputation: 5269
Yes, there are probably better ways of doing this. But this is so oh-so-simple:
<?php
$wordlist = file_get_contents("wordlist.txt");
//string string of invalid chars and make it lowercase
$string = "This is the best sentence ever! Winning!";
$string = strtolower($string);
$string = preg_replace('/[^\w\d_ -]/si', '', $string);
$words = explode(" ", $string);
$lines = explode("\n", $wordlist);
$scores = array();
foreach ($lines as $line) {
$split = preg_split("/(\#|\t)/", $line); //split on # or tab
$scores[$split[0]] = doubleval(array_pop($split));
//split[0] (first element) contains the word
//array_pop (last element) contains score
}
$total = 0;
foreach($words as $word) {
if (isset($scores[$word])) $total += $scores[$word];
}
echo $total;
?>
Upvotes: 1
Reputation: 145482
If you just need to find a word, then it's as simple as:
preg_match("/^$word#\d+\t+(\d+\.\d+)/m", $textfile, $match);
$sum += floatval($match[1]);
^
looks for the start of a line in /m
mode, and #
and \t
are literal separators, while \d+
matches decimals. The result group [1]
will be your float number.
The $word
needs escaping (preg_quote) could it potentially contain a /
forward slash itself. To search multiple words in one go implode them as alternatives list $word1|$word2|$word3
, add a capture group, and use preg_match_all
instead.
Upvotes: 0