Reputation: 7072
I had the following function in pandas 0.17:
df['numberrows'] = df.groupby(['column1','column2','column3'], as_index=False)[['column1']].transform('count').astype('int')
But I upgraded pandas today and now I get the error:
File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py",
line 3810, in insert raise ValueError('cannot insert {}, already exists'.format(item))
ValueError: cannot insert column1, already exists
What has changed in the update which causes this function to not work anymore?
I want to groupby the columns and add a column which has the amount or rows of the groupby.
If what I did before was not a good function, another way of grouping while getting the amount of rows that were grouped is also welcome.
EDIT:
small dataset:
column1 column2 column3
0 test car1 1
1 test2 car5 2
2 test car1 1
3 test4 car2 1
4 test2 car1 1
outcome would be:
column1 column2 column3 numberrows
0 test car1 1 2
1 test2 car5 2 1
3 test4 car2 1 1
4 test2 car1 1 1
Upvotes: 3
Views: 2556
Reputation: 210842
Consider the following approach:
In [18]: df['new'] = df.groupby(['column1','column2','column3'])['column1'] \
.transform('count')
In [19]: df
Out[19]:
column1 column2 column3 new
0 test car1 1 2
1 test2 car5 2 1
2 test car1 1 2
3 test4 car2 1 1
4 test2 car1 1 1
UPDATE:
In [26]: df.groupby(['column1','column2','column3'])['column1'] \
.count().reset_index(name='numberrows')
Out[26]:
column1 column2 column3 numberrows
0 test car1 1 2
1 test2 car1 1 1
2 test2 car5 2 1
3 test4 car2 1 1
Upvotes: 5
Reputation: 294258
Your syntax is sloppy, you are using as_index=False
with transform
.
as_index=False
will end up pushing those columns back into the dataframe proper when it finds that column1
already exists... uh-oh. However, that is completely unnecessary as transform
handles the index for you.
df.groupby(
['column1','column2','column3']
)['column1'].transform('count').astype('int')
0 2
1 1
2 2
3 1
4 1
Name: column1, dtype: int64
Or make a new column
df.assign(
new=df.groupby(
['column1','column2','column3']
)['column1'].transform('count').astype('int')
)
column1 column2 column3 new
0 test car1 1 2
1 test2 car5 2 1
2 test car1 1 2
3 test4 car2 1 1
4 test2 car1 1 1
Upvotes: 1