woshitom
woshitom

Reputation: 5131

Python, Pandas error with groupby

I have the following Pandas DataFrame 'df1':

id_client                product
client1                  product1
client1                  product4
client1                  product5
client2                  product1
client2                  product6
client3                  product1

First I want to groupby id_client and retrieve the matching products inside a list:

id_client             product
client1               [product1,product4,product5]
client2               [product1,product6]
client3               [product1]

Then for each element of each list I want to add a new line to a new DataFrame 'df2' like this (nb_product is the length of each list):

product            nb_product
product1           3
product4           3
product5           3
product1           2
product6           2
product1           1

So first I created a new dictionary:

nb_of_combination = {}
nb_of_combination['product'] = []
nb_of_combination['nb_product'] = []

then I declared the following function:

def nb_of_combination(my_list):
  nb_comb = len(my_list)
  for row in my_list:
    nb_of_combination['product'].append(row)
    nb_of_combination['nb_product'].append(nb_comb)

then I grouped by 'df1' by the field 'id_client' and I'm applying the function 'nb_of_combination':

df1 = df1.groupby('id_client',as_index=False).apply(lambda x: nb_of_combination(list(x.product)))

But I'm getting the following error:

df1 = df1.groupby('id_client',as_index=False).apply(lambda x: nb_of_combination(list(x.product)))
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 660, in apply
    return self._python_apply_general(f)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 667, in _python_apply_general
    not_indexed_same=mutated)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 2821, in _wrap_applied_output
    v = next(v for v in values if v is not None)

Which I really don't understand since:

df2 = pd.DataFrame(nb_of_combination)

seems to work well.

Upvotes: 1

Views: 961

Answers (1)

EdChum
EdChum

Reputation: 394409

Your method is overly complicated you can achieve what you want by calling transform and passing the function count and assign this back to the orig df as a new column. transform returns a series aligned to the original df, see the docs:

In [89]:

df['nb_product'] = df.groupby('id_client').transform(pd.Series.count)
df

Out[89]:
  id_client   product nb_product
0   client1  product1          3
1   client1  product4          3
2   client1  product5          3
3   client2  product1          2
4   client2  product6          2
5   client3  product1          1

Upvotes: 2

Related Questions