Reputation: 5082
I have a csv file with a very large number of item (5000 lines) in this format
storeId,bookId,nb
124,48361,0
124,48363,6
125,48362,8
125,48363,2
126,28933,4
142,55433,6
142,55434,10
171,55871,7
171,55872,6
I need to count the number of stores in the file, so for exemple with the line above the result should be 5
. But I need to doo it with 5000 lines so I can't just loop.
How can I achieve that?
I also need too return the max quantity, so 10
I began by converting the file into an array:
if (file_exists($file)) {
$csv = array_map('str_getcsv', file($file));
#Stores
$storeIds = array_column($csv, 0);
$eachStoreNb = array_count_values($storeIds);
$storeCount = count($eachStoreNb);
}
print_r($storeCount);
Is there a better way to do it? Faster ? Maybe without using the array
Upvotes: 0
Views: 352
Reputation: 1158
Solution in AWK, to compare the difference. This includes the count of each store as well. AWK should be able to process millions in less than 1 second. I use the same to filter duplicates from a file.
BEGIN{ # Set some variables initially
FS="," # field separator for INPUT
mymax=0 # init variable mymax
}
NR>1 { # skip the header line, this matches line 2 onwards
mycount[$1]++ # increase associative array at that position
if ($3>mymax){ # compare with max
mymax=$3
}
}
END{ # finally print results
for (i in mycount){
if (length(i)>0){
print "value " i " has " mycount[i]
}
}
print "Maximum value is " mymax
}
Upvotes: 0
Reputation: 14939
An answer with awk
would be:
awk -F, 'BEGIN {getline}
{ a[$1]++; m=$3>m?$3:m }
END{ for (i in a){ print i, a[i] };
print "Number of stores",length(a), "max:",m}' testfile
getline
to skip the first line$1
in array a
with one, and keep the max value in m
output:
124 52
125 52
126 26
142 52
171 52
Number of stores 5 max: 10
Upvotes: 0
Reputation: 4857
What 'nice_dev' says, but a little more compact.
$fp = fopen('<your_file>', 'r');
fseek($fp, strpos($content, "\n") + 1); // skip first line
$stores = [];
while($row = fgetcsv($fp)) {
$stores[$row[0]] = max([($stores[$row[0]] ?? 0), $row[2]]);
}
Working example.
Upvotes: 0
Reputation: 17805
Faster here would come in the context of micro-optimization, however you can see an improvement in memory usage.
max
variable keeping the track of max value using max()
function. Rest is self-explanatory.Snippet:
<?php
$file = 'test.csv';
if (file_exists($file)) {
$fp = fopen($file ,'r');
$max_nb = 0;
$store_set = [];
fgetcsv($fp); // ignoring headers
while(!feof($fp)){
$row = fgetcsv($fp);
$store_set[$row[0]] = true;
$max_nb = max($max_nb,end($row));
}
fclose($fp);
echo "Num Stores : ",count($store_set),"<br/>";
echo "Max NB : ",$max_nb;
}else{
echo "No such CSV file found.";
}
Note: For profiling, I suggest you to try both scripts using xdebug
Upvotes: 3
Reputation: 61
What if you looped through the file line by line? I mean ...
$datas = [];
$handle = fopen("filename.csv", "r");
$flagFirstLine = true;
while(!feof($handle)){
//dont read first line
if($flagFirstLine) continue;
$flagFirstLine = false;
$csvLine = fgetcsv($handle);
$storeID = $csvLine[0];
$datas[] = $storeID;
}
echo "all row: " . count($datas);
echo "\nnum store: " . count(array_unique($datas));
Upvotes: 0