Reputation: 627
I have one column of values on a daily level, and another column of values on a weekly level. I want to divide the daily values by the weekly value for the week in which the day falls, for the corresponding company and group. I'm trying to figure out a way to do without a loop, and I feel like there is a way to do this in pandas or numpy, but I can't figure it out. Thanks for the help!
For example, I have two data frames:
df1 =
Company Group Date People
A X 01/07/2015 5
A X 01/14/2015 10
A XX 01/07/2015 6
A XX 01/14/2015 12
B Y 01/07/2015 4
B Y 01/14/2015 8
B YY 01/07/2015 5
B YY 01/14/2015 4
df2 =
Company Group Date Value
A X 01/04/2015 5
A X 01/06/2015 10
A X 01/13/2015 15
A XX 01/05/2015 6
A XX 01/06/2015 9
A XX 01/11/2015 9
A XX 01/14/2015 12
B Y 01/05/2015 4
B Y 01/07/2015 6
B Y 01/13/2015 16
B Y 01/14/2015 24
B YY 01/03/2015 10
B YY 01/11/2015 10
B YY 01/14/2015 12
And I would like the result to be:
Company Group Date Value/People
A X 01/04/2015 1
A X 01/06/2015 2
A X 01/13/2015 1.5
A XX 01/05/2015 1
A XX 01/06/2015 1.5
A XX 01/11/2015 0.75
A XX 01/14/2015 1
B Y 01/05/2015 1
B Y 01/07/2015 1.5
B Y 01/13/2015 2
B Y 01/14/2015 3
B YY 01/03/2015 2
B YY 01/11/2015 2.5
B YY 01/14/2015 3
Upvotes: 0
Views: 1671
Reputation: 879561
The underlying dtype for date-like values stored in DataFrames and Series is NumPy datetime64[ns]
. Arrays of datetime64[ns]
values can be "cropped" to a canonical second, minute, hour, day, week, month, or year by calling astype('datetime64[?]')
, where ?
is replaced by the appropriate unit (e.g. s
, m
, h
, D
, W
, M
, Y
).
Here, our problem would largely be solved if we could classify each date to a canonical week. We can do this using .astype('datetime64[W]')
:
In [152]: df1['Week'] = df1['Date'].values.astype('datetime64[W]'); df1
Out[152]:
Company Date Group People Week
0 A 2015-01-07 X 5 2015-01-01
1 A 2015-01-14 X 10 2015-01-08
2 A 2015-01-07 XX 6 2015-01-01
3 A 2015-01-14 XX 12 2015-01-08
4 B 2015-01-07 Y 4 2015-01-01
5 B 2015-01-14 Y 8 2015-01-08
6 B 2015-01-07 YY 5 2015-01-01
7 B 2015-01-14 YY 4 2015-01-08
Having done this for both df1
and df2
, we can now merge the DataFrames on ['Week', 'Group', 'Company']
. This will match the appropriate rows from df1 and df2. Finding the ratio of Value to People is then easy.
import pandas as pd
df1 = pd.DataFrame({'Company': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'Date': ['01/07/2015', '01/14/2015', '01/07/2015', '01/14/2015', '01/07/2015', '01/14/2015', '01/07/2015', '01/14/2015'], 'Group': ['X', 'X', 'XX', 'XX', 'Y', 'Y', 'YY', 'YY'], 'People': [5, 10, 6, 12, 4, 8, 5, 4]})
df2 = pd.DataFrame({'Company': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B'], 'Date': ['01/04/2015', '01/06/2015', '01/13/2015', '01/05/2015', '01/06/2015', '01/11/2015', '01/14/2015', '01/05/2015', '01/07/2015', '01/13/2015', '01/14/2015', '01/03/2015', '01/11/2015', '01/14/2015'], 'Group': ['X', 'X', 'X', 'XX', 'XX', 'XX', 'XX', 'Y', 'Y', 'Y', 'Y', 'YY', 'YY', 'YY'], 'Value': [5, 10, 15, 6, 9, 9, 12, 4, 6, 16, 24, 10, 10, 12]})
for df in [df1, df2]:
df['Date'] = pd.to_datetime(df['Date'])
df['Week'] = df['Date'].values.astype('datetime64[W]')
result = pd.merge(df2, df1, how='left', on=['Week', 'Group', 'Company'], suffixes=['', '_1'])
result['Value/People'] = result['Value']/result['People']
result = result[['Company', 'Group', 'Date', 'Value/People']]
print(result)
yields
Company Group Date Value/People
0 A X 2015-01-04 1.00
1 A X 2015-01-06 2.00
2 A X 2015-01-13 1.50
3 A XX 2015-01-05 1.00
4 A XX 2015-01-06 1.50
5 A XX 2015-01-11 0.75
6 A XX 2015-01-14 1.00
7 B Y 2015-01-05 1.00
8 B Y 2015-01-07 1.50
9 B Y 2015-01-13 2.00
10 B Y 2015-01-14 3.00
11 B YY 2015-01-03 2.00
12 B YY 2015-01-11 2.50
13 B YY 2015-01-14 3.00
Note that there is "base offset" issue that needs to be considered when cropping dates to a canonical week. In other words, you need to decide when does a week begin. If you do not like the canonical choice being made by df['Date'].values.astype('datetime64[W]')
, you may need to add an offset to the dates. For example, to add a day to the dates in df['Date']
you could use:
(df['Date'].values + np.timedelta64(1, 'D')).astype('datetime64[W]')
Upvotes: 1