Reputation: 321
I have a table presented below.
+------+-------+--------------+------------------------------+------+
| good | store | date_id | map_dates | sale |
+------+-------+--------------+------------------------------+------+
| 1 | 2 | '2019-01-01' | ['2018-07-08'] | 10 |
+------+-------+--------------+------------------------------+------+
| 1 | 2 | '2019-05-06' | ['2019-01-01', '2018-07-08'] | 5 |
+------+-------+--------------+------------------------------+------+
| 5 | 4 | '2019-10-12' | ['2018-12-01'] | 24 |
+------+-------+--------------+------------------------------+------+
| 1 | 2 | '2018-07-08' | [] | 3 |
+------+-------+--------------+------------------------------+------+
| 5 | 4 | '2018-12-01' | [] | 15 |
+------+-------+--------------+------------------------------+------+
I want to group by columns good, store, and include only the dates specified in the map_dates column in the result. For example:
+------+-------+--------------+----------+
| good | store | date_id | sum_sale |
+------+-------+--------------+----------+
| 1 | 2 | '2019-01-01' | 3 |
+------+-------+--------------+----------+
| 1 | 2 | '2019-05-06' | 13 |
+------+-------+--------------+----------+
| 5 | 4 | '2019-10-12' | 15 |
+------+-------+--------------+----------+
How can I do this without using a loop?
Upvotes: 0
Views: 56
Reputation: 42916
First we explode, then we match our values by an inner merge
on good, store, map_dates
and date_id
. Finally we GroupBy.sum
:
dfn = df.explode('map_dates')
dfn = dfn.merge(dfn,
left_on=['good', 'store', 'map_dates'],
right_on=['good', 'store', 'date_id'],
suffixes=['', '_sum'])
dfn = dfn.groupby(['good', 'store', 'date_id'])['sale_sum'].sum().reset_index(
good store date_id sale_sum
0 1 2 2019-01-01 3
1 1 2 2019-05-06 13
2 5 4 2019-10-12 15
Upvotes: 1