Reputation: 4149
I have Pandas dataframe with two columns, such as:
df = ID state
255 NJ
255 NaN
266 CT
266 CT
277 NaN
277 NY
277 NaN
I want to fill missing values in state
.
Desired output is the following:
df = ID state
255 NJ
255 NJ
266 CT
266 CT
277 NY
277 NY
277 NY
How can I overcome this? Trying but without success. Tried, numpy.where
creating masks but getting this error operands could not be broadcast together with shapes (26229,) (2053,) ()
and many more. Any help is appreciated.
Upvotes: 3
Views: 437
Reputation: 862581
Use DataFrame.sort_values
with GroupBy.ffill
:
df['state'] = df.sort_values('state').groupby('ID')['state'].ffill()
print (df)
ID state
0 255 NJ
1 255 NJ
2 266 CT
3 266 CT
4 277 NY
5 277 NY
6 277 NY
If necessary filling multiple columns use:
cols = ['state', ...]
df.loc[:, cols] = df.sort_values('state').groupby('ID')[cols].ffill()
Upvotes: 2
Reputation: 150735
IIUC, each ID
has a unique state
, so:
df['state'] = df.groupby('ID')['state'].transform('first')
output:
ID state
0 255 NJ
1 255 NJ
2 266 CT
3 266 CT
4 277 NY
5 277 NY
6 277 NY
Upvotes: 2
Reputation: 323226
Using groupby
with ffill
+bfill
df.state=df.groupby('ID').state.apply(lambda x : x.ffill().bfill())
df
Out[907]:
ID state
0 255 NJ
1 255 NJ
2 266 CT
3 266 CT
4 277 NY
5 277 NY
6 277 NY
Upvotes: 1
Reputation: 3739
first sort_values and then use ffill using groupby
df.sort_values(by=['ID','state'],ascending=[True,True],inplace=True)
df['state'] = df.groupby(['ID']).transform(pd.Series.ffill)
Upvotes: 1