Reputation: 521

Pandas create a unique id for each row based on a condition

I've a dataset where one of the column is as below. I'd like to create a new column based on the below condition.

For values in column_name, if 1 is present, create a new id. If 0 is present, also create a new id. But if 1 is repeated in more than 1 continuous rows, then id should be same for all rows. The sample output result can be seen below.

column_name
1
0
0
1
1
1
1
0
0
1

column_name -- ID
1 -- 1
0 -- 2
0 -- 3
1 -- 4
1 -- 4
1 -- 4
1 -- 4
0 -- 5
0 -- 6
1 -- 7

Upvotes: 3

Answers (3)

Vishnu Kunchur

Reputation: 1726

Essentially leveraging the fact that a 1 in the Series lagged by another 1 should be treated as part of the same group, while every 0 calls for an increment. One of four things will happen:

1) 0 with a preceding 0 : Increment by 1

2) 0 with a preceding 1 : Increment by 1

3) 1 with a preceding 1 : Increment by 0

4) 1 with a preceding 0: Increment by 1

(df['column_name'] + df['column_name'].shift(1)).\ ## Creates a Series with values 0, 1, or 2 (first field is NaN)
fillna(0).\ ## Fills first field with 0
isin([0,1]).\ ## True for cases 1, 2, and 4 described above, else False (case 3) 
astype('int').\ ## Integerizes it
cumsum()

Output:

Upvotes: 2

Ami Tavory

Reputation: 76406

Say your Series is

s = pd.Series([1, 0, 0, 1, 1, 1, 1, 0, 0, 1])

Then you can use:

>>> ((s != 1) | (s.shift(1) != 1)).cumsum()
0    1
1    2
2    3
3    4
4    4
5    4
6    4
7    5
8    6
9    7
dtype: int64

This checks that either the current entry is not 1, or that the previous entry is not 1, and then performs a cumulative sum on the result.

Upvotes: 5

Primusa

Reputation: 13498

At this stage I would just use a regular python for loop

column_name = pd.Series([1, 0, 0, 1, 1, 1, 1, 0, 0, 1])

ID = [1]

for i in range(1, len(column_name)):
    ID.append(ID[-1] + ((column_name[i] + column_name[i-1]) < 2))

print(ID)

>>> [1, 2, 3, 4, 4, 4, 4, 5, 6, 7]

And then you can assign ID as a column in your dataframe

Upvotes: 1

Pandas create a unique id for each row based on a condition

Answers (3)

Related Questions