Pandas Count Unique Values for Each Index and Turn Unique Rows into Summed Columns

Question

I've got some weather data from different weather agencies and I'm trying to count the readings in Pandas. I need to:

remove duplicate readings on the same label/city
create a new column with the number of unique readings
create a new column for each agency containing the count of (unique) readings for that index

Current Data Structure:

City          Reading Agency
NEW YORK      80      US
NEW YORK      142     GB
NEW YORK      200     GB
NEW YORK      80      US
TORONTO       96      US
TORONTO       79      US
TORONTO       83      US
TORONTO       73      GB
TORONTO       67      US
TORONTO       67      US

Desired Output:

City          Total    US    GB
NEW YORK          3    1     2
TORONTO           5    4     1

I've found a couple ways of doing this, but they are incredibly messy. I'd very much appreciate an elegant solution if anyone has one.

Thank you.

jezrael · Accepted Answer

You can use groupby with aggregating nunique, then reshape by unstack and last insert new column Total by sum:

df = df.groupby(['City','Agency'])['Reading'].nunique().unstack(fill_value=0)
df.insert(0, 'Total', df.sum(axis=1))
print (df)
Agency    Total  GB  US
City                   
NEW YORK      3   2   1
TORONTO       5   1   4

Pandas Count Unique Values for Each Index and Turn Unique Rows into Summed Columns

Answers (1)

Related Questions