user3357979
user3357979

Reputation: 627

Pandas dataframe divide rows based on dates

I have one column of values on a daily level, and another column of values on a weekly level. I want to divide the daily values by the weekly value for the week in which the day falls, for the corresponding company and group. I'm trying to figure out a way to do without a loop, and I feel like there is a way to do this in pandas or numpy, but I can't figure it out. Thanks for the help!

For example, I have two data frames:

df1 =

Company Group Date       People
A       X     01/07/2015 5
A       X     01/14/2015 10
A       XX    01/07/2015 6
A       XX    01/14/2015 12
B       Y     01/07/2015 4
B       Y     01/14/2015 8
B       YY    01/07/2015 5
B       YY    01/14/2015 4

df2 =

Company Group Date       Value
A       X     01/04/2015 5
A       X     01/06/2015 10
A       X     01/13/2015 15
A       XX    01/05/2015 6
A       XX    01/06/2015 9
A       XX    01/11/2015 9
A       XX    01/14/2015 12
B       Y     01/05/2015 4
B       Y     01/07/2015 6
B       Y     01/13/2015 16
B       Y     01/14/2015 24
B       YY    01/03/2015 10
B       YY    01/11/2015 10
B       YY    01/14/2015 12

And I would like the result to be:

Company Group Date       Value/People
A       X     01/04/2015 1
A       X     01/06/2015 2
A       X     01/13/2015 1.5
A       XX    01/05/2015 1
A       XX    01/06/2015 1.5
A       XX    01/11/2015 0.75
A       XX    01/14/2015 1
B       Y     01/05/2015 1
B       Y     01/07/2015 1.5
B       Y     01/13/2015 2
B       Y     01/14/2015 3
B       YY    01/03/2015 2
B       YY    01/11/2015 2.5
B       YY    01/14/2015 3

Upvotes: 0

Views: 1671

Answers (1)

unutbu
unutbu

Reputation: 879561

The underlying dtype for date-like values stored in DataFrames and Series is NumPy datetime64[ns]. Arrays of datetime64[ns] values can be "cropped" to a canonical second, minute, hour, day, week, month, or year by calling astype('datetime64[?]'), where ? is replaced by the appropriate unit (e.g. s, m, h, D, W, M, Y).

Here, our problem would largely be solved if we could classify each date to a canonical week. We can do this using .astype('datetime64[W]'):

In [152]: df1['Week'] = df1['Date'].values.astype('datetime64[W]'); df1
Out[152]: 
  Company       Date Group  People       Week
0       A 2015-01-07     X       5 2015-01-01
1       A 2015-01-14     X      10 2015-01-08
2       A 2015-01-07    XX       6 2015-01-01
3       A 2015-01-14    XX      12 2015-01-08
4       B 2015-01-07     Y       4 2015-01-01
5       B 2015-01-14     Y       8 2015-01-08
6       B 2015-01-07    YY       5 2015-01-01
7       B 2015-01-14    YY       4 2015-01-08

Having done this for both df1 and df2, we can now merge the DataFrames on ['Week', 'Group', 'Company']. This will match the appropriate rows from df1 and df2. Finding the ratio of Value to People is then easy.


import pandas as pd
df1 = pd.DataFrame({'Company': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'Date': ['01/07/2015', '01/14/2015', '01/07/2015', '01/14/2015', '01/07/2015', '01/14/2015', '01/07/2015', '01/14/2015'], 'Group': ['X', 'X', 'XX', 'XX', 'Y', 'Y', 'YY', 'YY'], 'People': [5, 10, 6, 12, 4, 8, 5, 4]})

df2 = pd.DataFrame({'Company': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B'], 'Date': ['01/04/2015', '01/06/2015', '01/13/2015', '01/05/2015', '01/06/2015', '01/11/2015', '01/14/2015', '01/05/2015', '01/07/2015', '01/13/2015', '01/14/2015', '01/03/2015', '01/11/2015', '01/14/2015'], 'Group': ['X', 'X', 'X', 'XX', 'XX', 'XX', 'XX', 'Y', 'Y', 'Y', 'Y', 'YY', 'YY', 'YY'], 'Value': [5, 10, 15, 6, 9, 9, 12, 4, 6, 16, 24, 10, 10, 12]})

for df in [df1, df2]:
    df['Date'] = pd.to_datetime(df['Date'])
    df['Week'] = df['Date'].values.astype('datetime64[W]')

result = pd.merge(df2, df1, how='left', on=['Week', 'Group', 'Company'], suffixes=['', '_1'])
result['Value/People'] = result['Value']/result['People']
result = result[['Company', 'Group', 'Date', 'Value/People']]
print(result)

yields

   Company Group       Date  Value/People
0        A     X 2015-01-04          1.00
1        A     X 2015-01-06          2.00
2        A     X 2015-01-13          1.50
3        A    XX 2015-01-05          1.00
4        A    XX 2015-01-06          1.50
5        A    XX 2015-01-11          0.75
6        A    XX 2015-01-14          1.00
7        B     Y 2015-01-05          1.00
8        B     Y 2015-01-07          1.50
9        B     Y 2015-01-13          2.00
10       B     Y 2015-01-14          3.00
11       B    YY 2015-01-03          2.00
12       B    YY 2015-01-11          2.50
13       B    YY 2015-01-14          3.00

Note that there is "base offset" issue that needs to be considered when cropping dates to a canonical week. In other words, you need to decide when does a week begin. If you do not like the canonical choice being made by df['Date'].values.astype('datetime64[W]'), you may need to add an offset to the dates. For example, to add a day to the dates in df['Date'] you could use:

(df['Date'].values + np.timedelta64(1, 'D')).astype('datetime64[W]')

Upvotes: 1

Related Questions