Reputation: 1811
I have a dataframe that looks like:
id key value
1 foo 1
2 bar NaN
3 foo NaN
4 bar 3
5 bar NaN
6 foo 1
There is only one non-null value
corresponding to each key
. For example, foo is either 1 or NaN. Bar is either 3 or NaN.
I need to fill value
for each key
based on those non-null values. The desired result should look like:
id key value
1 foo 1
2 bar 3
3 foo 1
4 bar 3
5 bar 3
6 foo 1
I know how to do this using a for loop, but this dataframe has millions of rows and looping through it will take a really long time. fillna()
I've only used for straightforward uses, and I'm not sure how it would be applied here, or if it should be.
Any help is appreciated.
Upvotes: 2
Views: 317
Reputation: 1934
This script will help you
import pandas as pd
from numpy import nan
from sklearn.impute import SimpleImputer
data = {
"id": [1, 2, 3, 4, 5, 6],
"key": ["foo", "bar", "foo", "bar", "foo", "bar"],
"value": [1, nan, nan, 3, nan, 1]
}
data = pd.DataFrame(data)
imputer = SimpleImputer(missing_values=nan, strategy='mean')
data['value'] = imputer.fit_transform(data[['value']])
print(data)
output
id key value
0 1 foo 1.000000
1 2 bar 1.666667
2 3 foo 1.666667
3 4 bar 3.000000
4 5 foo 1.666667
5 6 bar 1.000000
I use SimpleImputer
with mean
sterategy and fill the NaN values
Upvotes: -1
Reputation: 323226
Let us try ffill
with bfill
df['value'] = df.groupby('key').value.apply(lambda x : x.ffill().bfill())
df
Out[85]:
id key value
0 1 foo 1.0
1 2 bar 3.0
2 3 foo 1.0
3 4 bar 3.0
4 5 bar 3.0
5 6 foo 1.0
Upvotes: 1
Reputation: 150735
You can use groupby().transform
with first
:
df['value'] = df.groupby('key')['value'].transform('first')
Output:
id key value
0 1 foo 1.0
1 2 bar 3.0
2 3 foo 1.0
3 4 bar 3.0
4 5 bar 3.0
5 6 foo 1.0
Upvotes: 3