Reputation: 515
I have 2 dataframes
for Promotion Data
and Sales Data
as below.
+-----------+----------+
| Promotion | Date |
+-----------+----------+
| A | 5-Jan-21 |
+-----------+----------+
| B | 8-Jan-21 |
+-----------+----------+
| C | 8-Jan-21 |
+-----------+----------+
df_prom = pd.DataFrame({
'Promotion':['A','B','C'],
'Date':['5-Jan-21','8-Jan-21','8-Jan-21'] })
+-----------+-------+
| Date | Sales |
+-----------+-------+
| 1-Jan-21 | 1,140 |
+-----------+-------+
| 2-Jan-21 | 3,046 |
+-----------+-------+
| 3-Jan-21 | 2,981 |
+-----------+-------+
| 4-Jan-21 | 2,262 |
+-----------+-------+
| 5-Jan-21 | 3,266 |
+-----------+-------+
| 6-Jan-21 | 3,231 |
+-----------+-------+
| 7-Jan-21 | 2,979 |
+-----------+-------+
| 8-Jan-21 | 1,687 |
+-----------+-------+
| 9-Jan-21 | 2,728 |
+-----------+-------+
| 10-Jan-21 | 1,136 |
+-----------+-------+
| 11-Jan-21 | 3,159 |
+-----------+-------+
| 12-Jan-21 | 1,799 |
+-----------+-------+
df_sales = pd.DataFrame({
'Date':['1-Jan-21', '2-Jan-21', '3-Jan-21', '4-Jan-21', '5-Jan-21', '6-Jan-21', '7-Jan-21',
'8-Jan-21', '9-Jan-21', '10-Jan-21', '11-Jan-21', '12-Jan-21'],
'Sales':[1140, 3046, 2981, 2262, 3266, 3231, 2979, 1687, 2728, 1136, 3159, 1799]
})
My task is to calculate a weighted average
Prior 3 days
& Post 3 days sales
considering all 3 promotions.
Meaning, All 3 promotions has different dates. I need to bring it to a common Prior 3 days
& common Post 3 days
.
Step 1:
Eg Promotion A
is on 5-Jan-21
, so Prior 3 days
would be from 2-Jan-21 to 4-Jan-21
.
so the average would be 2763
(average of 3046, 2981, 2262)
that of Post 3 Days
would be 6-Jan-21 to 8-Jan-21
.so the average would be 2632
(average of 3231, 2979, 1687)
in the case of Promotion B
is on 8-Jan-21
, Prior 3 days
would be 5-Jan-21 to 7-Jan-21
averages is equal to 3159
(average of 3266, 3231, 2979).
The Post 3 days
would be 9-Jan-21 to 11-Jan-21
average is 2341
(average of 2728, 1136, 3159).
For C
it is same as B
. since, the dates are same.
Step 2:
Once after calculating the Prior 3
of A, B and C
individually. I must averages this together. i.e average would be 3027
(Average of 2763 of A
, 3159 of B
and 3159 of C
).
same applied to Post 3 days
average. which is equal to 2438
(Average of 2632 of A
, 2341 of B
and 2341 of C
).
so my final answer should look like
+--------------+---------+
| Type | Average |
+--------------+---------+
| Prior 3 days | 3,027 |
+--------------+---------+
| Post 3 days | 2,438 |
+--------------+---------+
please guide me on how should I approach to solve this.
Upvotes: 1
Views: 226
Reputation: 862591
Here is solution working with overlapping values, because each datetime is processing separately.
For correct working is necessary all datetimes before and after 3 values exist in df_sales['Date']
and are sorted.
First convert values to datetimes:
df_prom['Date'] = pd.to_datetime(df_prom['Date'], format='%d-%b-%y')
df_sales['Date'] = pd.to_datetime(df_sales['Date'], format='%d-%b-%y')
Then repeat Date
column to DataFrame with number of columns like values in df_prom
:
arr = np.broadcast_to(df_sales['Date'].to_numpy()[:, None],
(df_sales.shape[0], df_prom.shape[0]))
df = pd.DataFrame(arr)
Compare datetimes and forward 3 and back filling 3 values for 3 previous and 3 next datetimes, this mask is used for filter Sales
:
m = df.eq(df_prom['Date'])
prev_mask = df.where(m).bfill(limit=3).mask(m).notna()
next_mask = df.where(m).ffill(limit=3).mask(m).notna()
prev = np.where(prev_mask, df_sales['Sales'].to_numpy()[:, None], np.nan)
next1 = np.where(next_mask, df_sales['Sales'].to_numpy()[:, None], np.nan)
print (prev)
[[ nan nan nan]
[3046. nan nan]
[2981. nan nan]
[2262. nan nan]
[ nan 3266. 3266.]
[ nan 3231. 3231.]
[ nan 2979. 2979.]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]]
print (next1)
[[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[3231. nan nan]
[2979. nan nan]
[1687. nan nan]
[ nan 2728. 2728.]
[ nan 1136. 1136.]
[ nan 3159. 3159.]
[ nan nan nan]]
And last get mean with omit missing values:
fin = pd.DataFrame({'Type':['Prior 3 days','Post 3 days'],
'Average':[np.nanmean(prev), np.nanmean(next1)]
})
print (fin)
Type Average
0 Prior 3 days 3026.777778
1 Post 3 days 2438.111111
EDIT:
For dynamic limits use:
limits = (pd.to_datetime('12-Jan-2021') - df_prom['Date']).dt.days
d = dict(enumerate(limits))
print (d)
prev_mask = df.where(m).apply(lambda x: x.bfill(limit=d[x.name])).mask(m).notna()
next_mask = df.where(m).apply(lambda x: x.ffill(limit=d[x.name])).mask(m).notna()
print (prev)
[[1140. nan nan]
[3046. nan nan]
[2981. nan nan]
[2262. 2262. 2262.]
[ nan 3266. 3266.]
[ nan 3231. 3231.]
[ nan 2979. 2979.]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]]
print (next1)
[[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[ nan nan nan]
[3231. nan nan]
[2979. nan nan]
[1687. nan nan]
[2728. 2728. 2728.]
[1136. 1136. 1136.]
[3159. 3159. 3159.]
[1799. 1799. 1799.]]
Upvotes: 1
Reputation: 438
Sample Data:
df_sales = pd.DataFrame({
'Date':['1-Jan-21', '2-Jan-21', '3-Jan-21', '4-Jan-21', '5-Jan-21', '6-Jan-21', '7-Jan-21',
'8-Jan-21', '9-Jan-21', '10-Jan-21', '11-Jan-21', '12-Jan-21'],
'Sales':[1140, 3046, 2981, 2262, 3266, 3231, 2979, 1687, 2728, 1136, 3159, 1799]
})
df_prom = pd.DataFrame({
'Promotion':['A','B', 'C'],
'Date':['5-Jan-21','8-Jan-21', '8-Jan-21'] })
Steps:
df_proms = df_prom.groupby('Date').count().reset_index()
df = df_sales.merge(df_proms, on='Date', how='left')
df['rolling'] = df['Sales'].rolling(3).mean()
df['post 3 days'] = df['rolling'].shift(-3) * df['Promotion']
df['prior 3 days'] = df['rolling'].shift(1) * df['Promotion']
df = df[~df.Promotion.isnull()]
weighted_df = pd.DataFrame(data=df[['post 3 days', 'prior 3 days']].sum()/df['Promotion'].sum()).reset_index().rename({"index": "Type", 0: "Average"}, axis=1)
weighted_df
Type Average
0 post 3 days 2438.111111
1 prior 3 days 3026.777778
Upvotes: 2