richie
richie

Reputation: 18648

using python pandas, find the count of an element in a column grouped by a few columns

How do I find the count of an element in a column grouped by a few columns using python pandas?

I have the following csv file structure:

'Country'     'City'     'Year'  'Month'   'Value'    'Street_Code'
 USA          New York    1971     jan        0.0        1
 USA          New York    1971     feb       23.5        1
 USA          New York    1971     mar       10.2        1
 USA          Florida     1971     jan        0.0        1
 USA          Florida     1971     feb        0.0        1
 USA          Florida     1971     mar        0.0        1
 USA          New York    1971     jan        0.0        2
 USA          New York    1971     feb        15.0       2
 USA          New York    1971     mar        7.6        2
 USA          Florida     1971     jan        0.0        2
 USA          Florida     1971     feb        0.0        2
 USA          Florida     1971     mar        2.3        2

I want to count the number of zeros (0.0) in 'value' by grouping 'Country', 'City','Year' & 'Street Code'.

I've tried this so far;

import pandas as pd
data = pd.read_csv('country_details.csv')
count_data = data[data['Value'] == 0.0] # I'm filtering the data. I don't think this is the right way of doing it
grouped = count_data.groupby(['Country','Year','Month','Street_Code']) # I'm stuck here

Upvotes: 1

Views: 2694

Answers (1)

Rutger Kassies
Rutger Kassies

Reputation: 64443

Your filtering is almost right, but you need to reference the column name, in this case 'Value'.

Try:

import pandas as pd
import StringIO

csv = StringIO.StringIO("""Country,City,Year,Month,Value,Street_Code
USA,NewYork,1971,jan,0.0,1
USA,NewYork,1971,feb,23.5,1
USA,NewYork,1971,mar,10.2,1
USA,Florida,1971,jan,0.0,1
USA,Florida,1971,feb,0.0,1
USA,Florida,1971,mar,0.0,1
USA,NewYork,1971,jan,0.0,2
USA,NewYork,1971,feb,15.0,2
USA,NewYork,1971,mar,7.6,2
USA,Florida,1971,jan,0.0,2
USA,Florida,1971,feb,0.0,2
USA,Florida,1971,mar,2.3,2""")

data = pd.read_csv(csv)

datasub = data[data['Value'] == 0.0]

print datasub.groupby(['Country','Year','Month','Street_Code'])['Value'].count()

Country  Year  Month  Street_Code
USA      1971  feb    1              1
                      2              1
               jan    1              2
                      2              2
               mar    1              1

Upvotes: 2

Related Questions