Reputation: 11
I am new to programming in Python. I have a large CSV file (~5k items). There are 2 columns that I need to get data counted. The best way to explain what I need is show you a few rows of the csv:
Name column OPTIONALDATA5 column
Collaborative Desk Broward
Collaborative Desk Broward
Academic Desk Broward
Academic Desk Broward
Academic Desk Broward
Academic Desk Broward
Collaborative Desk Broward
Collaborative Desk Broward
Collaborative Desk Broward
Collaborative Desk Broward
Broward Broward
Alachua Alachua
Collaborative Desk Alachua
Collaborative Desk Alachua
Collaborative Desk Alachua
Collaborative Desk Alachua
Collaborative Desk Alachua
In the above example I simply want the result as follows:
Broward:
collaborative Desk - 6
Academic Desk - 4
Broward - 1
Alachua:
collaborative Desk - 5
Alachua - 1
Maybe a total as well and then on to the next library in the spreadsheet.
I started writing code but I am wondering if there is a better way to do this.
Upvotes: 0
Views: 3006
Reputation: 18418
This works too (assuming your file is \t
delimited):
import itertools
import operator
import csv
import collections
results = collections.defaultdict(lambda: collections.defaultdict(int))
with open('sample.csv', 'r') as f_in:
f_in.seek(0)
rdr = csv.reader(f_in, delimiter='\t')
next(rdr)
for row in rdr:
results[row[1]][row[0]] += 1
for k, v in results.iteritems():
print "%s" % k
for k2, v2 in v.iteritems():
print " %s - %s" % (k2, v2)
Outputs:
Alachua
Alachua - 1
Collaborative Desk - 5
Broward
Collaborative Desk - 6
Academic Desk - 4
Broward - 1
Upvotes: 1
Reputation: 12092
Assuming the data is tab delimited, this is one way of getting what you want:
import csv
from collections import defaultdict, Counter
input_file = open('data')
csv_reader = csv.reader(input_file, delimiter='\t')
data = defaultdict(list)
for row in csv_reader:
data[row[1]].append(row[0])
The data will now contain:
{'Alachua': ['Alachua', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk'],
'Broward': ['Collaborative Desk', 'Collaborative Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Broward']}
You can iterate over the value list for each key and get the total count, or use the Counter
method in python as:
for k, v in data.items():
print k
print Counter(v)
This prints:
Alachua
Counter({'Collaborative Desk': 5, 'Alachua': 1})
Broward
Counter({'Collaborative Desk': 6, 'Academic Desk': 4, 'Broward': 1})
Upvotes: 3