Create new variable for grouped data using python

Question

I have a data frame like this:

d = {'name': ['john', 'john', 'john', 'Tim', 'Tim', 'Tim','Bob', 'Bob'], 'Prod': ['101', '102', '101', '501', '505', '301', '302', '302'],'Qty': ['5', '4', '1', '3', '5', '4', '1', '3']}
df = pandas.DataFrame(data= d)

What I want to do is, create a new id variable. Whenever a name (say john) appears for the first time this id will be equal to 1, for other occurrence of the same name (john) this id variable will be 0. This will be done for all the other names in the data. How do I go about doing that ?

Final output should be like this:

NOTE: If someone knows SAS, there you can sort your data by the name and then use first.name.

       ""if first.variable = 1 then id = 1""

For first occurrence of same name first.name = 1. For any other repeat occurrence of same name, first.name = 0. I am trying to replicate the same in python.

So far I have tried pandas groupby and first() functionality and also numpy.where() but couldnt make any of that work. Any fresh perspective will be appreciated.

BENY · Accepted Answer

You can using cumcount

s=df.groupby(['Prod','name']).cumcount().add(1)
df['counter']=s.mask(s.gt(1),0)
df
Out[1417]: 
  Prod Qty  name  counter
0  101   5  john        1
1  102   4  john        1
2  101   1  john        0
3  501   3   Tim        1
4  505   5   Tim        1
5  301   4   Tim        1
6  302   1   Bob        1
7  302   3   Bob        0

Update :

s=df.groupby(['name']).cumcount().add(1).le(1).astype(int)
s
Out[1421]: 
0    1
1    0
2    0
3    1
4    0
5    0
6    1
7    0
dtype: int32

More Fast

df.loc[df.name.drop_duplicates().index,'counter']=1
df.fillna(0)
Out[1430]: 
  Prod Qty  name  counter
0  101   5  john      1.0
1  102   4  john      0.0
2  101   1  john      0.0
3  501   3   Tim      1.0
4  505   5   Tim      0.0
5  301   4   Tim      0.0
6  302   1   Bob      1.0
7  302   3   Bob      0.0

Create new variable for grouped data using python

Answers (2)

Related Questions