Reputation: 2061
I am a beginner struggling with Python every day. I have a large data set that has Name of animals in the 2nd column. I have a program to add up the count of each animal by its name (each row has 1 animal Name and 1 "count" data). I am trying to get the sum of those count data I obtained using Python but I am not being able to do that. The code I have so far is:
import csv, collections
reader=csv.reader(open('C:\Users\Owl\Data.txt','rb'), delimiter='\t')
counts=collections.Counter()
for line in reader:
Name=line[1]
counts[Name]+=1
for (Name, count) in sorted(counts.iteritems()):
Output=list('%s' % count) #Make output string to a list
Sum=sum(Output) # Sum function requires a list
print 'Total kinds of Animals: %s' % Sum
I get an error saying " File "sum_count.py", line 17, in <module> Sum=sum(Output) # Sum function requires a list TypeError: unsupported operand type(s) for +: 'int' and 'str'".
What I have figured out so far is that because sum
apparently require the input type to be a list, I converted the count data (which was string) to a list but when I do Output=list('%s' % count)
, it seems that all the count data that are more than 2 digits are split. For example, when I print Output, it will be like this:
['1', '6', '3']
['3']
['1', '8', '5', '9']
['7', '9']
instead of
['163']
['3']
['1859']
['79']
What I want to do here is to get a single "sum" of these elements. Here, it will be 4. Four kinds of animals.
I am thinking that this may be the reason why I am getting the error above. I might be wrong but could somebody please help me how to solve this issue? Thank you for your help in advance!
Upvotes: 2
Views: 26538
Reputation: 10186
(Re-written following comment discussion; original answer just pointed out that OP was trying to add strings.)
The other answers have more opportunities for extension (and so I would recommend them), but if you only want to quickly count the number of types of animals, you could simply count the number of lines in the file and use your knowledge of how the file is structured. For example, if your csv file has a header like Name, Count, etc.
followed on the next lines by only the data you're interested in, the number of animals would be the number of non-empty lines in the file, minus one for the header. You could then print the count using the following code:
print sum(1 for line in open('test.csv') if line.strip() != '') - 1
Here's what each part of that code does:
sum()
add all of the elements of the list inside it. In this case, there is not a list inside, but a generator expression, which here can be thought of a list that doesn't get in memory. 1 for line in open('test.csv')
this is the first part of the generator expression. By itself, it would produce a generator whose length was the number of lines in test.csv
and in which every element was 1
(the analogous list would be [1,1,1,1,1]
if there five lines in the file).if line.strip() != ''
this is the second part of the generator expression. It makes sure that a 1
is only added to the generator if the line has anything on it.-1
one is subtracted from the value sum(...)
returns to ignore the header of the csvWell, I hope that helps in some way, and I should reiterate that this method is just a quick and dirty approach; you wouldn't use it if, for example, you were doing other stuff with the data.
Upvotes: 2
Reputation: 3882
First you are using a Counter object, but only use it as a substitute for a defaultdict. If you wantet to use it to do your counting you could have passed you input like this (assuming species appear more than once and you want to know how often each species appears):
counts = collections.Counter(map(lambda item: item[0], reader))
But if you want to count the sum of all animals (regardless of species) you have to add that count in your first loop. And as other have said, since you are reading in strings you first have to make an integer out of that count.
import csv
reader=csv.reader(open('in','rb'), delimiter='\t')
counts = dict()
for data in reader:
animal = data[1]
if animal not in counts:
counts[animal]= 1
else:
counts[animal]+= 1
for animal in counts:
print 'Animals of scpecies %s: %s' % (animal, counts[animal])
print 'Species total: %s' % len(counts)
print 'All animals: %s' % sum(counts.values())
Upvotes: 0
Reputation: 24788
I think the problem stems from the fact that you are differentiating the "count" from the "total". The "count" is the total number of occurrences of that one item. Additionally, you are misusing collections.Counter()
, which has the capability of making your job a lot easier. Here is a coded example of what I think you are trying to achieve:
counts = collections.Counter(line[1] for line in reader if len(line) > 1)
#Now all the occurrences of each item are summed up, AND ordered by number of occurrences
print "Total number of animals: %d" % len(counts)
#This is what I THINK you are trying to do.
Additionally:
for name, number in counts.items():
print "# of %s: %d" % (name, number)
You have a list of strings, not a list of integers.
An example:
mylist = ['1', '2', '3']
All sum()
does is perform cumulative addition on the iterable, similar to this:
total = 0
for item in mylist:
total = total + item
In this case total
is an int
(value 0) and item
is a str
(value '1'). Python doesn't know what to do with 0 + 'string'
.
Upvotes: 2
Reputation: 5993
I don't think you need to use sum
.
Try this:
for (Name, count) in sorted(counts.iteritems()):
print 'Species total: %s' % count
Or, possibly better:
for (Name, count) in sorted(counts.iteritems()):
print 'Total for species %s: %s' % (Name, count)
sum
is for when you have a list of numbers and want to find the sum of that list of numbers.
You have already collected the sum number of each animal using counts
-- you just need to display it.
Edit
To sum up the total number of animals counted, you can do this:
total = sum(counts.values())
print 'Total number of animals: %d' % total
Edit 2
The number of kinds of animals counted is simply the length of the counts
dictionary:
print 'Number of kinds of animals: %d' % len(counts)
Upvotes: 3