Reputation: 55
I have a dataframe:
>>> import pandas as pd
>>> dates = ['1/1/2020', '1/1/2020', '1/1/2020', '1/2/2020', '1/2/2020', '1/2/2020']
>>> humidity = [11, 22, 33, 44, 55, 66]
>>> hours = [0,16,24,0,16,24]
>>> df = pd.DataFrame(list(zip(dates, hours, humidity)),
... columns =['dates', 'hours', 'humidity'])
>>> df
dates hours humidity
0 1/1/2020 0 11
1 1/1/2020 16 22
2 1/1/2020 24 33
3 1/2/2020 0 44
4 1/2/2020 16 55
5 1/2/2020 24 66
I want to create two new columns. One column will store the index when hours == 16. Another column will store the index when humidity is maximum. Both these operations need to be done separately for each date. I can find the maximum humidity for each date using the groupby and transform functions as below:
>>> df["max_humidity"] = ""
>>> df["max_humidity"] = df["humidity"].groupby(df["dates"]).transform("max")
>>> df
dates hours humidity max_humidity
0 1/1/2020 0 11 33
1 1/1/2020 16 22 33
2 1/1/2020 24 33 33
3 1/2/2020 0 44 66
4 1/2/2020 16 55 66
5 1/2/2020 24 66 66
However, I couldn't figure out how to find the index position when the humidity is maximum for each date. The final result should look like this:
>>> df
dates hours humidity max_humidity sixteen_hr_idx max_humidity_idx
0 1/1/2020 0 11 33 1 2
1 1/1/2020 16 22 33 1 2
2 1/1/2020 24 33 33 1 2
3 1/2/2020 0 44 66 4 5
4 1/2/2020 16 55 66 4 5
5 1/2/2020 24 66 66 4 5
Upvotes: 3
Views: 235
Reputation: 13821
You can pass idxmax
in transform
which will give you the 'max_humidity_idx'.
For the 'sixteen_hr_idx' you can spot the rows that equal to 16 and create a dictionary that has as Keys the dates and Values the index which you can map
back on your date column:
# max_humidity_idx
df["max_humidity_idx"] = df["humidity"].groupby(df["dates"]).transform("idxmax")
# sixteen_hr_idx
h = df.loc[df.hours.eq(16)]
df['sixteen_hr_idx'] = df.dates.map(dict(zip(h.dates,h.index)))
dates hours humidity max_humidity sixteen_hr_idx max_humidity_idx
0 1/1/2020 0 11 33 1 2
1 1/1/2020 16 22 33 1 2
2 1/1/2020 24 33 33 1 2
3 1/2/2020 0 44 66 4 5
4 1/2/2020 16 55 66 4 5
5 1/2/2020 24 66 66 4 5
I assume that you will have one row per date that equals to 16. If not the result of map
will be last index per date (please refer to the example below):
>>> df
dates hours humidity
0 1/1/2020 0 11
1 1/1/2020 16 22
2 1/1/2020 24 33
3 1/2/2020 0 44
4 1/2/2020 16 55
5 1/2/2020 16 16
h = df.loc[df.hours.eq(16)]
>>> dict(zip(h.dates,h.index))
{'1/1/2020': 1, '1/2/2020': 5}
Upvotes: 4
Reputation: 3294
Update, sophocles solution for sixteen_hr_idx
is way shorter and better, I partially adapted it:
df = df.merge(df.loc[df.hours.eq(16)].reset_index()[["dates", "index"]], on="dates")
df = df.merge(df.groupby(df["dates"])["hours"].idxmax(), on="dates").rename(columns={
"index":"sixteen_hr_idx",
"hours_y":"max_humidity_idx"
})
Output:
dates hours humidity max_humidity sixteen_hr_idx max_humidity_idx
0 1/1/2020 0 11 33 1 2
1 1/1/2020 16 22 33 1 2
2 1/1/2020 24 33 33 1 2
3 1/2/2020 0 44 66 4 5
4 1/2/2020 16 55 66 4 5
5 1/2/2020 24 66 66 4 5
Upvotes: 1
Reputation: 211
I was able to accomplish this by creating two dataframes with the index values you need and then merging them on to the parent dataframe.
import pandas as pd
dates = ['1/1/2020', '1/1/2020', '1/1/2020', '1/2/2020', '1/2/2020', '1/2/2020']
humidity = [11, 22, 33, 44, 55, 66]
hours = [0,16,24,0,16,24]
df = pd.DataFrame(list(zip(dates, hours, humidity)),
columns =['dates', 'hours', 'humidity'])
hum_df = df["humidity"].groupby(df["dates"]).transform("max")
df['max_humidity'] = hum_df
sixteen_df = df[df['hours'] == 16]
sixteen_df.index.name = 'sixteen_hour_index'
sixteen_df = sixteen_df.reset_index()
sixteen_df
Out[2]:
sixteen_hour_index dates hours humidity max_humidity
0 1 1/1/2020 16 22 33
1 4 1/2/2020 16 55 66
hum_index_df = df[df['humidity'] == df['max_humidity']]
hum_index_df.index.name = 'humidity_max_index'
hum_index_df = hum_index_df.reset_index()
hum_index_df
Out[3]:
humidity_max_index dates hours humidity max_humidity
0 2 1/1/2020 24 33 33
1 5 1/2/2020 24 66 66
Merge allows us to broadcast all of the index values (2 and 5) on to the dates where there is a match.
df = df.merge(hum_index_df[['humidity_max_index', 'dates']], on=('dates'))
df = df.merge(sixteen_df[['sixteen_hour_index', 'dates']], on=('dates'))
df
Out[4]:
dates hours humidity max_humidity humidity_max_index sixteen_hour_index
0 1/1/2020 0 11 33 2 1
1 1/1/2020 16 22 33 2 1
2 1/1/2020 24 33 33 2 1
3 1/2/2020 0 44 66 5 4
4 1/2/2020 16 55 66 5 4
5 1/2/2020 24 66 66 5 4
Upvotes: 1