Reputation: 99
I have a pandas DataFrame looking as follows:
| ID | x | y | z |
| -- | - | --- | --- |
| 1 | 0 | nan | 36 |
| 1 | 1 | 12 | nan |
| 1 | 2 | nan | 38 |
| 1 | 3 | 11 | 37 |
| 2 | 0 | nan | 37 |
| 2 | 1 | nan | 37 |
| 2 | 2 | nan | nan |
| 2 | 3 | nan | nan |
I now want to fill the nan values for each ID in the following way:
The result should be a pandas DataFrame in which all nan values are filled by the scheme as explained above. However, I am a beginner with pandas and don't know how to go about this problem to get the full DataFrame.
Upvotes: 1
Views: 678
Reputation: 862581
Use Series.interpolate
per groups for columns in list first, and then replace missing values by median in DataFrame.fillna
:
cols = ['y','z']
median = df[cols].median()
df[cols] = (df.groupby('ID')[cols].transform(lambda x: x.interpolate())
.fillna(median))
print (df)
ID x y z
0 1 0 11.5 36.0
1 1 1 12.0 37.0
2 1 2 11.5 38.0
3 1 3 11.0 37.0
4 2 0 11.5 37.0
5 2 1 11.5 37.0
6 2 2 11.5 37.0
7 2 3 11.5 37.0
Also if need specify limit_direction
in interpolation
use:
cols = ['y','z']
median = df[cols].median()
df[cols]= df.groupby('ID')[cols].transform(lambda x: x.interpolate(limit_direction='both'))
.fillna(median))
print (df)
ID x y z
0 1 0 12.0 36.0
1 1 1 12.0 37.0
2 1 2 11.5 38.0
3 1 3 11.0 37.0
4 2 0 11.5 37.0
5 2 1 11.5 37.0
6 2 2 11.5 37.0
7 2 3 11.5 37.0
Upvotes: 1