Shanks
Shanks

Reputation: 177

Pandas Count Unique Values for Each Index and Turn Unique Rows into Summed Columns

I've got some weather data from different weather agencies and I'm trying to count the readings in Pandas. I need to:

  1. remove duplicate readings on the same label/city
  2. create a new column with the number of unique readings
  3. create a new column for each agency containing the count of (unique) readings for that index

Current Data Structure:

City          Reading Agency
NEW YORK      80      US
NEW YORK      142     GB
NEW YORK      200     GB
NEW YORK      80      US
TORONTO       96      US
TORONTO       79      US
TORONTO       83      US
TORONTO       73      GB
TORONTO       67      US
TORONTO       67      US

Desired Output:

City          Total    US    GB
NEW YORK          3    1     2
TORONTO           5    4     1

I've found a couple ways of doing this, but they are incredibly messy. I'd very much appreciate an elegant solution if anyone has one.

Thank you.

Upvotes: 2

Views: 2165

Answers (1)

jezrael
jezrael

Reputation: 863651

You can use groupby with aggregating nunique, then reshape by unstack and last insert new column Total by sum:

df = df.groupby(['City','Agency'])['Reading'].nunique().unstack(fill_value=0)
df.insert(0, 'Total', df.sum(axis=1))
print (df)
Agency    Total  GB  US
City                   
NEW YORK      3   2   1
TORONTO       5   1   4

Upvotes: 1

Related Questions