user353877
user353877

Reputation: 1221

How to count occurrences in VERY LARGE dataset with PHP

Let's say I want to keep track of the number of times a word occurs...

//Update the totals
foreach($arrayOfWords as $word) {
    $totals[$word] = $totals[$word] + 1;
}

Now, imagine, that this little block of code is called HUNDREDS of times, each times with HUNDREDS OF THOUSANDS of NEW words in $arrayWords each time, leading to millions of entries inside of the associate array $totals. Despite the simplicity of the operation (adding 1 to the existing value), PHP slows down significantly as we approach millions of entries.

Can you think of a better way to count occurrences (preferably without using a database)?

Upvotes: 3

Views: 246

Answers (2)

Ilmari Karonen
Ilmari Karonen

Reputation: 50328

Combining the suggestions of Mark Baker and quickshiftin, the following code should be quite a bit faster if the input word list contains many repeated words:

$counts = array_count_values( $words );
foreach( $counts as $word => $count ) {
    $totals[$word] += $count;
}

That said, in any case, PHP is probably not the optimal tool for this kind of massive data processing. However, without knowing more about why you want to do this, it's hard to suggest specific alternatives.

Upvotes: 2

quickshiftin
quickshiftin

Reputation: 69581

Here's one way to speed it up

//Update the totals
foreach($arrayOfWords as $word) {
    $totals[$word]++;
}

No need to search for the same key within the hash twice just to increment its value.

Also, (just a note) I don't see how the length of $totals could ever exceed the length of $arrayOfWords, unless you're adding words to $totals somewhere else in your code.

Upvotes: 2

Related Questions