AK91
AK91

Reputation: 731

Pandas - How to separate and group by comma separated strings

For the following example dataframe, how would you:

  1. separate out the searched_products and bought_products fields
  2. group them by date, page and just products
  3. with columns showing a count of each product

From this:

+------------+------+---------------------+-----------------+
| date       | page | searched_products   | bought_products |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc  | apple, orange       | orange          |
+------------+------+---------------------+-----------------+
| 2019-01-01 | def  | apple, pear, orange | orange, pear    |
+------------+------+---------------------+-----------------+
| 2019-01-01 | abc  | grapes, orange      | apple, grapes   |
+------------+------+---------------------+-----------------+
| 2019-01-02 | def  | apple               | apple, oranges  |
+------------+------+---------------------+-----------------+
| 2019-01-02 | ghi  | apple, grapes       | orange          |
+------------+------+---------------------+-----------------+
| 2019-01-02 | jkl  | pear, apple         | pear            |
+------------+------+---------------------+-----------------+
| etc        | etc  | etc                 | etc             |
+------------+------+---------------------+-----------------+

to this:

+------------+------+---------+----------+-----------+
| date       | page | product | searches | purchases |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | apple   | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | orange  | 2        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | abc  | grapes  | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | pear    | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-01 | def  | orange  | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def  | apple   | 1        | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | def  | orange  | NaN      | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | grapes  | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | ghi  | orange  | NaN      | 1         |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl  | apple   | 1        | NaN       |
+------------+------+---------+----------+-----------+
| 2019-01-02 | jkl  | pear    | 1        | 1         |
+------------+------+---------+----------+-----------+
| etc        | etc  | etc     | etc      | etc       |
+------------+------+---------+----------+-----------+

Upvotes: 0

Views: 71

Answers (1)

jezrael
jezrael

Reputation: 863166

Solution for pandas 0.25+ with DataFrame.explode for repeat values by splitted values, then aggregate counts by GroupBy.size and last concat together:

s = (df.assign(searches=df['searched_products'].str.split(', '))
      .explode('searches')
      .groupby(['date','page','searches'])
      .size()
      .rename('searches'))
b = (df.assign(purchases=df['bought_products'].str.split(', '))
       .explode('purchases')
       .groupby(['date','page','purchases'])
       .size()
       .rename('purchases'))

df = pd.concat([s, b], axis=1).rename_axis(('date','page','product')).reset_index()
print (df)
        date page  product  searches  purchases
0   20190101  abc    apple       1.0        1.0
1   20190101  abc   grapes       1.0        1.0
2   20190101  abc   orange       2.0        1.0
3   20190101  def    apple       1.0        NaN
4   20190101  def      ear       1.0        NaN
5   20190101  def   orange       1.0        1.0
6   20190101  def     pear       NaN        1.0
7   20190102  def    apple       1.0        1.0
8   20190102  def  oranges       NaN        1.0
9   20190102  ghi    apple       1.0        NaN
10  20190102  ghi   grapes       1.0        NaN
11  20190102  ghi   orange       NaN        1.0
12  20190102  jkl    apple       1.0        NaN
13  20190102  jkl     pear       1.0        1.0

Upvotes: 3

Related Questions