Fissium
Fissium

Reputation: 321

How do I make a grouping that includes only values from the list?

I have a table presented below.

+------+-------+--------------+------------------------------+------+
| good | store | date_id      | map_dates                    | sale |
+------+-------+--------------+------------------------------+------+
| 1    | 2     | '2019-01-01' | ['2018-07-08']               | 10   |
+------+-------+--------------+------------------------------+------+
| 1    | 2     | '2019-05-06' | ['2019-01-01', '2018-07-08'] | 5    |
+------+-------+--------------+------------------------------+------+
| 5    | 4     | '2019-10-12' | ['2018-12-01']               | 24   |
+------+-------+--------------+------------------------------+------+
| 1    | 2     | '2018-07-08' | []                           | 3    |
+------+-------+--------------+------------------------------+------+
| 5    | 4     | '2018-12-01' | []                           | 15   |
+------+-------+--------------+------------------------------+------+

I want to group by columns good, store, and include only the dates specified in the map_dates column in the result. For example:

+------+-------+--------------+----------+
| good | store | date_id      | sum_sale |
+------+-------+--------------+----------+
| 1    | 2     | '2019-01-01' | 3        |
+------+-------+--------------+----------+
| 1    | 2     | '2019-05-06' | 13       |
+------+-------+--------------+----------+
| 5    | 4     | '2019-10-12' | 15       |
+------+-------+--------------+----------+

How can I do this without using a loop?

Upvotes: 0

Views: 56

Answers (1)

Erfan
Erfan

Reputation: 42916

First we explode, then we match our values by an inner merge on good, store, map_dates and date_id. Finally we GroupBy.sum:

dfn = df.explode('map_dates')
dfn = dfn.merge(dfn, 
                left_on=['good', 'store', 'map_dates'], 
                right_on=['good', 'store', 'date_id'],
                suffixes=['', '_sum'])

dfn = dfn.groupby(['good', 'store', 'date_id'])['sale_sum'].sum().reset_index(

   good  store     date_id  sale_sum
0     1      2  2019-01-01         3
1     1      2  2019-05-06        13
2     5      4  2019-10-12        15

Upvotes: 1

Related Questions