Reputation: 157
value count is : df['ID'].value_counts().values -----> array([4,3,3,1], dtype=int64)
input:
ID emp
a 1
a 1
b 1
a 1
b 1
c 1
c 1
a 1
b 1
c 1
d 1
when I jumble the ID column
df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values
output:
ID emp
a 4
c 3
d 3
c 1
b 1
a 1
c 1
a 1
b 1
b 1
a 1
expected result:
ID emp
a 4
c 3
d 1
c 1
b 3
a 1
c 1
a 1
b 1
b 1
a 1
problem :the count is not checking the ID before assigning it the emp.
Upvotes: 0
Views: 652
Reputation: 24314
This alone is giving df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values
desired output for your given sample dataframe
but you can try:
cond=~df.duplicated(keep='first', subset=['ID'])
df.loc[cond,'emp']=df.loc[cond,'ID'].map(df['ID'].value_counts())
Upvotes: 1
Reputation: 862511
Here is problem ouput of df['ID'].value_counts()
is Series
with counted values in different number of values like original data, for new column filled by couter value use Series.map
:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df['ID'].map(df['ID'].value_counts())
Or GroupBy.transform
with size
:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df.groupby('ID')['ID'].transform('size')
Output Series with 4 values cannot assign back, because different index in df1.index
and df['ID'].value_counts().index
print (df['ID'].value_counts())
a 4
b 3
c 3
d 1
Name: ID, dtype: int64
If convert to numpy array only first 4 values are assigned, because in this DataFrame are 4 groups a,b,c,d
, so df.duplicated(subset=['ID'])
returned 4 times True
s, but in order 4,3,3,1
what reason of wrong output:
print (df['ID'].value_counts().values)
[4 3 3 1]
What need - new column (Series
) with same df.index
:
print (df['ID'].map(df['ID'].value_counts()))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64
print (df.groupby('ID')['ID'].transform('size'))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64
Upvotes: 3