Reputation: 1071
I have a data frame like this:
d = {'name': ['john', 'john', 'john', 'Tim', 'Tim', 'Tim','Bob', 'Bob'], 'Prod': ['101', '102', '101', '501', '505', '301', '302', '302'],'Qty': ['5', '4', '1', '3', '5', '4', '1', '3']}
df = pandas.DataFrame(data= d)
What I want to do is, create a new id variable. Whenever a name (say john) appears for the first time this id will be equal to 1, for other occurrence of the same name (john) this id variable will be 0. This will be done for all the other names in the data. How do I go about doing that ?
Final output should be like this:
NOTE: If someone knows SAS, there you can sort your data by the name and then use first.name.
""if first.variable = 1 then id = 1""
For first occurrence of same name first.name = 1. For any other repeat occurrence of same name, first.name = 0. I am trying to replicate the same in python.
So far I have tried pandas groupby and first() functionality and also numpy.where() but couldnt make any of that work. Any fresh perspective will be appreciated.
Upvotes: 0
Views: 139
Reputation: 323326
You can using cumcount
s=df.groupby(['Prod','name']).cumcount().add(1)
df['counter']=s.mask(s.gt(1),0)
df
Out[1417]:
Prod Qty name counter
0 101 5 john 1
1 102 4 john 1
2 101 1 john 0
3 501 3 Tim 1
4 505 5 Tim 1
5 301 4 Tim 1
6 302 1 Bob 1
7 302 3 Bob 0
Update :
s=df.groupby(['name']).cumcount().add(1).le(1).astype(int)
s
Out[1421]:
0 1
1 0
2 0
3 1
4 0
5 0
6 1
7 0
dtype: int32
More Fast
df.loc[df.name.drop_duplicates().index,'counter']=1
df.fillna(0)
Out[1430]:
Prod Qty name counter
0 101 5 john 1.0
1 102 4 john 0.0
2 101 1 john 0.0
3 501 3 Tim 1.0
4 505 5 Tim 0.0
5 301 4 Tim 0.0
6 302 1 Bob 1.0
7 302 3 Bob 0.0
Upvotes: 3
Reputation: 13498
We can just work directly with your dictionary d and loop through to create a new entry.
d = {'name': ['john', 'john', 'john', 'Tim', 'Tim', 'Tim','Bob', 'Bob'], 'Prod': ['101', '102', '101', '501', '505', '301', '302', '302'],'Qty': ['5', '4', '1', '3', '5', '4', '1', '3']}
names = set() #store names that have appeared
id = []
for i in d['name']:
if i in names: #if it appeared add 0
id.append(0)
else:
id.append(1) #add 1 and note that it has appeared
names.add(i)
d['id'] = id #add entry to your dictionary
df = pandas.DataFrame(data= d)
Upvotes: 1