snapcrack
snapcrack

Reputation: 1811

Fill null values for groups of rows

I have a dataframe that looks like:

id   key   value
1    foo   1
2    bar   NaN
3    foo   NaN
4    bar   3
5    bar   NaN
6    foo   1

There is only one non-null value corresponding to each key. For example, foo is either 1 or NaN. Bar is either 3 or NaN.

I need to fill value for each key based on those non-null values. The desired result should look like:

id   key   value
1    foo   1
2    bar   3
3    foo   1
4    bar   3
5    bar   3
6    foo   1

I know how to do this using a for loop, but this dataframe has millions of rows and looping through it will take a really long time. fillna() I've only used for straightforward uses, and I'm not sure how it would be applied here, or if it should be.

Any help is appreciated.

Upvotes: 2

Views: 317

Answers (3)

azibom
azibom

Reputation: 1934

This script will help you

import pandas as pd
from numpy import nan
from sklearn.impute import SimpleImputer

data = {
  "id":    [1, 2, 3, 4, 5, 6],
  "key":   ["foo", "bar", "foo", "bar", "foo", "bar"],
  "value": [1, nan, nan, 3, nan, 1]
}

data          = pd.DataFrame(data)
imputer       = SimpleImputer(missing_values=nan, strategy='mean')
data['value'] = imputer.fit_transform(data[['value']])

print(data) 

output

   id  key     value
0   1  foo  1.000000
1   2  bar  1.666667
2   3  foo  1.666667
3   4  bar  3.000000
4   5  foo  1.666667
5   6  bar  1.000000

I use SimpleImputer with mean sterategy and fill the NaN values

Upvotes: -1

BENY
BENY

Reputation: 323226

Let us try ffill with bfill

df['value'] = df.groupby('key').value.apply(lambda x : x.ffill().bfill())
df
Out[85]: 
   id  key  value
0   1  foo    1.0
1   2  bar    3.0
2   3  foo    1.0
3   4  bar    3.0
4   5  bar    3.0
5   6  foo    1.0

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You can use groupby().transform with first:

df['value'] = df.groupby('key')['value'].transform('first')

Output:

   id  key  value
0   1  foo    1.0
1   2  bar    3.0
2   3  foo    1.0
3   4  bar    3.0
4   5  bar    3.0
5   6  foo    1.0

Upvotes: 3

Related Questions