Sana Ali
Sana Ali

Reputation: 185

Getting column value based on other multiple conditions

I have a dataframe:

 cid    si
A   1
A   0
A   1
A   0
A   1
A   0
A   0
A   0
A   0
A   0
A   0
A   0
A   0
A   0
B   1
B   0
B   0
B   0
B   0
B   0
B   0

I need to have another column with named ide which should add the same value until next 1 in si is encountered and value in cid remains same. If value in cid gets changed the identifier starts over. Adding sample output.

cid si  ide
A   1   aa
A   0   aa
A   1   bb
A   0   bb
A   1   cc
A   0   cc
A   0   cc
A   0   cc
A   0   cc
A   0   cc
A   0   cc
A   0   cc
A   0   cc
A   0   cc
B   1   aa
B   0   aa
B   0   aa
B   0   aa
B   0   aa
B   0   aa
B   0   aa

Upvotes: 1

Views: 61

Answers (1)

user3483203
user3483203

Reputation: 51185

First, define a mapping dictionary that maps 1...n with your desired filler values; here is a small example:

dct = {1: 'aa', 2: 'bb', 3: 'cc'}

Then use groupby, cumsum and map:

df.groupby('cid').si.cumsum().map(dct)

0     aa
1     aa
2     bb
3     bb
4     cc
5     cc
6     cc
7     cc
8     cc
9     cc
10    cc
11    cc
12    cc
13    cc
14    aa
15    aa
16    aa
17    aa
18    aa
19    aa
20    aa
Name: si, dtype: object

The important part here is cumsum, as that is how we know that a new group is starting:

df.groupby('cid').si.cumsum()

0     1
1     1
2     2
3     2
4     3
5     3
6     3
7     3
8     3
9     3
10    3
11    3
12    3
13    3
14    1
15    1
16    1
17    1
18    1
19    1
20    1
Name: si, dtype: int64

Upvotes: 1

Related Questions