saeeds6
saeeds6

Reputation: 31

Create a new ID column based on conditions in other column using pandas

I am trying to make a new column 'ID' which should give a unique ID each time there is no 'NaN' value in 'Data' column. If the non null values come right to each other, the ID remains the same. I have provided how my final Id column should look like below as reference to better understand. Could anyone guide me on this?

Id  Data
0   NaN
0   NaN
0   NaN
1   54
1   55
0   NaN
0   NaN
2   67
0   NaN
0   NaN
3   33
3   44
3   22
0   NaN

Upvotes: 1

Views: 444

Answers (2)

BENY
BENY

Reputation: 323226

Using factorize

v=pd.factorize(df.Data.isnull().cumsum()[df.Data.notnull()])[0]+1
df.loc[df.Data.notnull(),'Newid']=v
df.Newid.fillna(0,inplace=True)
df
    Id  Data  Newid
0    0   NaN    0.0
1    0   NaN    0.0
2    0   NaN    0.0
3    1  54.0    1.0
4    1  55.0    1.0
5    0   NaN    0.0
6    0   NaN    0.0
7    2  67.0    2.0
8    0   NaN    0.0
9    0   NaN    0.0
10   3  33.0    3.0
11   3  44.0    3.0
12   3  22.0    3.0
13   0   NaN    0.0

Upvotes: 1

ALollz
ALollz

Reputation: 59519

.groupby the cumsum to get consecutive groups, using where to mask the NaN. .ngroup gets the consecutive IDs. Also possible with rank.

s = df.Data.isnull().cumsum().where(df.Data.notnull())
df['ID'] = df.groupby(s).ngroup()+1
# df['ID'] = s.rank(method='dense').fillna(0).astype(int)

Output:

    Data  ID
0    NaN   0
1    NaN   0
2    NaN   0
3   54.0   1
4   55.0   1
5    NaN   0
6    NaN   0
7   67.0   2
8    NaN   0
9    NaN   0
10  33.0   3
11  44.0   3
12  22.0   3
13   NaN   0

Upvotes: 1

Related Questions