Reputation: 51
This question was asked before by user907629, and Maria Zverina answered the question, but she didn't import the data from external csv file.
My file contains more than 800000 records, and I want to import an external csv file. What changes should be done in this frequency count code?
Upvotes: 2
Views: 3367
Reputation: 180542
You can do it without storing any intermediary lists:
import csv
from collections import Counter
from itertools import imap
from operator import itemgetter
with open('yourcsv') as f:
next(f) # skip the header
cn = Counter(imap(itemgetter(2), csv.reader(f)))
for t in cn.iteritems():
print("{} appears {} times".format(*t))
There is no reason to store data in lists unless you plan on using the list, itemgetter
will pull just the third column value from each row. You need to pass whatever column you want to count and set the delimiter to whatever delimits your data.
Upvotes: 4
Reputation: 141
If you only need to do this once and if you are using a UNIX machine you can make use of the excellent command line tools as well. Counting words would be as simple as
cat "inputfile.txt" | sort | uniq -c
To store those values in an output file use
cat "inputfile.txt" | sort | uniq -c > outputfile.txt
See http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html for a discussion on when command line can be (up to 235x) faster and easier than an hadoop cluster.
Upvotes: -1
Reputation: 3502
open
to read file externally instead of StringIOCheck the new code:
import csv
from collections import Counter
input_stream = open('external.csv')
reader = csv.reader(input_stream, delimiter='\t')
reader.next() #skip header
cities = [row[2] for row in reader]
for (k,v) in Counter(cities).iteritems():
print "%s appears %d times" % (k, v)
Upvotes: 1