python_enthusiast
python_enthusiast

Reputation: 946

Divide by bins with pandas

I have a dataframe in pandas that looks like below. Index is date time object, ordered by day, divided in 5 minute bins. I have a column called 'col1'. So if I do

df['col1']

I get:

DateTime
2008-04-28 09:40:00     300.0
2008-04-28 09:45:00    -800.0
2008-04-28 09:50:00       0.0
2008-04-28 09:55:00    -100.0
2008-04-28 10:00:00       0.0    
2008-04-29 09:40:00     500.0
2008-04-29 09:45:00     800.0
2008-04-29 09:50:00     100.0
2008-04-29 09:55:00    -100.0
2008-04-29 10:00:00       0.0

I have another dataframe in pandas obtained using groupby in the original dataframe using

df2 = df([df.index.time])[['col2']].mean()    

that outputs:

           col2
09:40:00   4603.585657
09:45:00   5547.011952
09:50:00   8532.007952
09:55:00   6175.298805
10:00:00   4236.055777

What I would like to do is to divide col1 by col2 for each of the 5 minute bins without using a for loop. To explain better, for all the days, for each bin divide col1 by col2. For example, divide all the 9:40:00 values in col1 by 9:40:00 value in col2.

I have no idea how to begin doing this without a for loop, but I have the impression that it should be doable with pandas.

The expected output is:

DateTime
2008-04-28 09:40:00     300.0/4603.585657
2008-04-28 09:45:00    -800.0/5547.011952
2008-04-28 09:50:00       0.0/8532.007952
2008-04-28 09:55:00    -100.0/6175.298805
2008-04-28 10:00:00       0.0/4236.055777  
2008-04-29 09:40:00     500.0/4603.585657
2008-04-29 09:45:00     800.0/5547.011952
2008-04-29 09:50:00     100.0/8532.007952
2008-04-29 09:55:00    -100.0/6175.298805
2008-04-29 10:00:00       0.0/4236.055777

Upvotes: 4

Views: 802

Answers (1)

jezrael
jezrael

Reputation: 862581

If need divide by times:

df['new'] = df['col1'].div(df.groupby(df.index.time)['col1'].transform('mean'))
print (df)
                      col1   new
DateTime                        
2008-04-28 09:40:00  300.0  0.75
2008-04-28 09:45:00 -800.0  -inf
2008-04-28 09:50:00    0.0  0.00
2008-04-28 09:55:00 -100.0  1.00
2008-04-28 10:00:00    0.0   NaN
2008-04-29 09:40:00  500.0  1.25
2008-04-29 09:45:00  800.0   inf
2008-04-29 09:50:00  100.0  2.00
2008-04-29 09:55:00 -100.0  1.00
2008-04-29 10:00:00    0.0   NaN

Or if need divide by days:

df['new'] = df['col1'].div(df.groupby(df.index.date)['col1'].transform('mean'))
print (df)
                      col1       new
DateTime                            
2008-04-28 09:40:00  300.0 -2.500000
2008-04-28 09:45:00 -800.0  6.666667
2008-04-28 09:50:00    0.0 -0.000000
2008-04-28 09:55:00 -100.0  0.833333
2008-04-28 10:00:00    0.0 -0.000000
2008-04-29 09:40:00  500.0  1.923077
2008-04-29 09:45:00  800.0  3.076923
2008-04-29 09:50:00  100.0  0.384615
2008-04-29 09:55:00 -100.0 -0.384615
2008-04-29 10:00:00    0.0  0.000000

Upvotes: 1

Related Questions