Reputation: 946
I have a dataframe in pandas that looks like below. Index is date time object, ordered by day, divided in 5 minute bins. I have a column called 'col1'. So if I do
df['col1']
I get:
DateTime
2008-04-28 09:40:00 300.0
2008-04-28 09:45:00 -800.0
2008-04-28 09:50:00 0.0
2008-04-28 09:55:00 -100.0
2008-04-28 10:00:00 0.0
2008-04-29 09:40:00 500.0
2008-04-29 09:45:00 800.0
2008-04-29 09:50:00 100.0
2008-04-29 09:55:00 -100.0
2008-04-29 10:00:00 0.0
I have another dataframe in pandas obtained using groupby in the original dataframe using
df2 = df([df.index.time])[['col2']].mean()
that outputs:
col2
09:40:00 4603.585657
09:45:00 5547.011952
09:50:00 8532.007952
09:55:00 6175.298805
10:00:00 4236.055777
What I would like to do is to divide col1 by col2 for each of the 5 minute bins without using a for loop. To explain better, for all the days, for each bin divide col1 by col2. For example, divide all the 9:40:00 values in col1 by 9:40:00 value in col2.
I have no idea how to begin doing this without a for loop, but I have the impression that it should be doable with pandas.
The expected output is:
DateTime
2008-04-28 09:40:00 300.0/4603.585657
2008-04-28 09:45:00 -800.0/5547.011952
2008-04-28 09:50:00 0.0/8532.007952
2008-04-28 09:55:00 -100.0/6175.298805
2008-04-28 10:00:00 0.0/4236.055777
2008-04-29 09:40:00 500.0/4603.585657
2008-04-29 09:45:00 800.0/5547.011952
2008-04-29 09:50:00 100.0/8532.007952
2008-04-29 09:55:00 -100.0/6175.298805
2008-04-29 10:00:00 0.0/4236.055777
Upvotes: 4
Views: 802
Reputation: 862581
If need divide by times:
df['new'] = df['col1'].div(df.groupby(df.index.time)['col1'].transform('mean'))
print (df)
col1 new
DateTime
2008-04-28 09:40:00 300.0 0.75
2008-04-28 09:45:00 -800.0 -inf
2008-04-28 09:50:00 0.0 0.00
2008-04-28 09:55:00 -100.0 1.00
2008-04-28 10:00:00 0.0 NaN
2008-04-29 09:40:00 500.0 1.25
2008-04-29 09:45:00 800.0 inf
2008-04-29 09:50:00 100.0 2.00
2008-04-29 09:55:00 -100.0 1.00
2008-04-29 10:00:00 0.0 NaN
Or if need divide by days:
df['new'] = df['col1'].div(df.groupby(df.index.date)['col1'].transform('mean'))
print (df)
col1 new
DateTime
2008-04-28 09:40:00 300.0 -2.500000
2008-04-28 09:45:00 -800.0 6.666667
2008-04-28 09:50:00 0.0 -0.000000
2008-04-28 09:55:00 -100.0 0.833333
2008-04-28 10:00:00 0.0 -0.000000
2008-04-29 09:40:00 500.0 1.923077
2008-04-29 09:45:00 800.0 3.076923
2008-04-29 09:50:00 100.0 0.384615
2008-04-29 09:55:00 -100.0 -0.384615
2008-04-29 10:00:00 0.0 0.000000
Upvotes: 1