Starlink
Starlink

Reputation: 55

Pandas: How to find the index of a cell from groupby values?

I have a dataframe:

>>> import pandas as pd
>>> dates = ['1/1/2020', '1/1/2020', '1/1/2020', '1/2/2020', '1/2/2020', '1/2/2020']
>>> humidity = [11, 22, 33, 44, 55, 66]
>>> hours = [0,16,24,0,16,24]
>>> df = pd.DataFrame(list(zip(dates, hours, humidity)),
...                columns =['dates', 'hours', 'humidity'])
>>> df
      dates  hours  humidity
0  1/1/2020      0        11
1  1/1/2020     16        22
2  1/1/2020     24        33
3  1/2/2020      0        44
4  1/2/2020     16        55
5  1/2/2020     24        66

I want to create two new columns. One column will store the index when hours == 16. Another column will store the index when humidity is maximum. Both these operations need to be done separately for each date. I can find the maximum humidity for each date using the groupby and transform functions as below:

>>> df["max_humidity"] = ""
>>> df["max_humidity"] = df["humidity"].groupby(df["dates"]).transform("max")
>>> df
      dates  hours  humidity  max_humidity
0  1/1/2020      0        11            33
1  1/1/2020     16        22            33
2  1/1/2020     24        33            33
3  1/2/2020      0        44            66
4  1/2/2020     16        55            66
5  1/2/2020     24        66            66

However, I couldn't figure out how to find the index position when the humidity is maximum for each date. The final result should look like this:

>>> df
      dates  hours  humidity  max_humidity  sixteen_hr_idx  max_humidity_idx
0  1/1/2020      0        11            33               1                 2
1  1/1/2020     16        22            33               1                 2
2  1/1/2020     24        33            33               1                 2
3  1/2/2020      0        44            66               4                 5
4  1/2/2020     16        55            66               4                 5
5  1/2/2020     24        66            66               4                 5

Upvotes: 3

Views: 235

Answers (3)

sophocles
sophocles

Reputation: 13821

You can pass idxmax in transform which will give you the 'max_humidity_idx'.

For the 'sixteen_hr_idx' you can spot the rows that equal to 16 and create a dictionary that has as Keys the dates and Values the index which you can map back on your date column:

# max_humidity_idx
df["max_humidity_idx"] = df["humidity"].groupby(df["dates"]).transform("idxmax")

# sixteen_hr_idx
h = df.loc[df.hours.eq(16)]
df['sixteen_hr_idx'] = df.dates.map(dict(zip(h.dates,h.index)))

      dates  hours  humidity  max_humidity  sixteen_hr_idx  max_humidity_idx
0  1/1/2020      0        11            33               1                 2
1  1/1/2020     16        22            33               1                 2
2  1/1/2020     24        33            33               1                 2
3  1/2/2020      0        44            66               4                 5
4  1/2/2020     16        55            66               4                 5
5  1/2/2020     24        66            66               4                 5

I assume that you will have one row per date that equals to 16. If not the result of map will be last index per date (please refer to the example below):

>>> df

      dates  hours  humidity
0  1/1/2020      0        11
1  1/1/2020     16        22
2  1/1/2020     24        33
3  1/2/2020      0        44
4  1/2/2020     16        55
5  1/2/2020     16        16

h = df.loc[df.hours.eq(16)]
>>> dict(zip(h.dates,h.index))

{'1/1/2020': 1, '1/2/2020': 5}

Upvotes: 4

Marco_CH
Marco_CH

Reputation: 3294

Update, sophocles solution for sixteen_hr_idx is way shorter and better, I partially adapted it:

df = df.merge(df.loc[df.hours.eq(16)].reset_index()[["dates", "index"]], on="dates")

df = df.merge(df.groupby(df["dates"])["hours"].idxmax(), on="dates").rename(columns={
    "index":"sixteen_hr_idx",
    "hours_y":"max_humidity_idx"
})

Output:

      dates  hours  humidity  max_humidity  sixteen_hr_idx  max_humidity_idx
0  1/1/2020      0        11            33               1                 2
1  1/1/2020     16        22            33               1                 2
2  1/1/2020     24        33            33               1                 2
3  1/2/2020      0        44            66               4                 5
4  1/2/2020     16        55            66               4                 5
5  1/2/2020     24        66            66               4                 5

Upvotes: 1

Correy Koshnick
Correy Koshnick

Reputation: 211

I was able to accomplish this by creating two dataframes with the index values you need and then merging them on to the parent dataframe.

import pandas as pd
dates = ['1/1/2020', '1/1/2020', '1/1/2020', '1/2/2020', '1/2/2020', '1/2/2020']
humidity = [11, 22, 33, 44, 55, 66]
hours = [0,16,24,0,16,24]
df = pd.DataFrame(list(zip(dates, hours, humidity)),
              columns =['dates', 'hours', 'humidity'])


hum_df = df["humidity"].groupby(df["dates"]).transform("max")
df['max_humidity'] = hum_df

sixteen_df = df[df['hours'] == 16]
sixteen_df.index.name = 'sixteen_hour_index'
sixteen_df = sixteen_df.reset_index()

sixteen_df
Out[2]: 
   sixteen_hour_index     dates  hours  humidity  max_humidity
0                   1  1/1/2020     16        22            33
1                   4  1/2/2020     16        55            66

hum_index_df = df[df['humidity'] == df['max_humidity']]
hum_index_df.index.name = 'humidity_max_index'
hum_index_df = hum_index_df.reset_index()

hum_index_df
Out[3]: 
   humidity_max_index     dates  hours  humidity  max_humidity
0                   2  1/1/2020     24        33            33
1                   5  1/2/2020     24        66            66

Merge allows us to broadcast all of the index values (2 and 5) on to the dates where there is a match.

df = df.merge(hum_index_df[['humidity_max_index', 'dates']], on=('dates'))
df = df.merge(sixteen_df[['sixteen_hour_index', 'dates']], on=('dates'))

df
Out[4]: 
      dates  hours  humidity  max_humidity  humidity_max_index  sixteen_hour_index
0  1/1/2020      0        11            33                   2                   1
1  1/1/2020     16        22            33                   2                   1
2  1/1/2020     24        33            33                   2                   1
3  1/2/2020      0        44            66                   5                   4
4  1/2/2020     16        55            66                   5                   4
5  1/2/2020     24        66            66                   5                   4

Upvotes: 1

Related Questions