Reputation: 23
I have the following dataframe:
claim diagnosis sequence
100 1 1.0
100 2 1.0
100 3 NaN
100 4 NaN
105 1 1.0
105 2 2.0
105 3 2.0
105 4 NaN
111 1 1.0
111 2 2.0
111 3 3.0
111 4 NaN
What I need is to have all the NaNs be replaced by the oneup value in the previous row, by claim:
claim diagnosis sequence
100 1 1.0
100 2 1.0
100 3 2.0
100 4 3.0
105 1 1.0
105 2 2.0
105 3 2.0
105 4 3.0
111 1 1.0
111 2 2.0
111 3 3.0
111 4 4.0
I've tried cumcount, but can't seem to get it to use the previous value. I've also tried loc, but also don't quite have the hang of it yet.
things = [{'claim':100, 'diagnosis':1, 'sequence':1},
{'claim':100, 'diagnosis':2, 'sequence':1 },
{'claim':100, 'diagnosis':3, },
{'claim':100, 'diagnosis':4, },
{'claim':105, 'diagnosis':1, 'sequence':1},
{'claim':105, 'diagnosis':2, 'sequence':2},
{'claim':105, 'diagnosis':3,'sequence':2 },
{'claim':105, 'diagnosis':4, },
{'claim':111, 'diagnosis':1, 'sequence':1},
{'claim':111, 'diagnosis':2, 'sequence':2},
{'claim':111, 'diagnosis':3, 'sequence':3},
{'claim':111, 'diagnosis':4, }]
df = pd.DataFrame(things)
df
I've been wracking my brain on this for days, and any help would be fantastic.
Upvotes: 2
Views: 885
Reputation: 323226
Using cumsum
count number of how many NaN
before that row , then adding up with ffill
s1=df['sequence'].isnull().groupby(df['claim']).cumsum()
df['sequence']=s1+df.groupby('claim')['sequence'].ffill()
df
Out[145]:
claim diagnosis sequence
0 100 1 1.0
1 100 2 1.0
2 100 3 2.0
3 100 4 3.0
4 105 1 1.0
5 105 2 2.0
6 105 3 2.0
7 105 4 3.0
8 111 1 1.0
9 111 2 2.0
10 111 3 3.0
11 111 4 4.0
Upvotes: 1