user
user

Reputation: 771

How to fill a np.nan column in a group based on a value in another column?

I have a subset of a dataframe that I need to backfill using grouping and conditions based on another column.

data = [
    ["1A", "aa", "2020-05-09"],
    ["1A", np.nan, "2020-05-09"],
    ["1A", "ab", "2020-05-10"],
    ["2A", "bb", "2020-05-09"],
    ["2A", np.nan, "2020-05-09"],
    ["2A", "bc", "2020-05-10"],
]

df = pd.DataFrame(data, columns=["product", "value", "dates"])
df

Essentially, for a subset of products I need to backfill the values based on the dates present. For example: For product 1A I need to fill in the value with aa because they both have the date 2020-05-09.

I also need to perform this on a subset of a dataset and have it join back in with the rest of the data once this operation is done.

I tried using a groupby with a lambda function (ffill or bfill) but that doesn't work since I really need to base the value fill off a condition.

Any help would be greatly appreciated?!

Upvotes: 1

Views: 198

Answers (3)

Anurag Dabas
Anurag Dabas

Reputation: 24322

Try via groupby() and ffill():

df['value']=df.groupby(['dates','product'])['value'].ffill()

Output of df:

    product     value   dates
0   1A          aa      2020-05-09
1   1A          aa      2020-05-09
2   1A          ab      2020-05-10
3   2A          bb      2020-05-09
4   2A          bb      2020-05-09
5   2A          bc      2020-05-10

Upvotes: 2

David Kaftan
David Kaftan

Reputation: 2174

You could just drop the na's, then join them with the raw df

df = df.set_index(['product','dates']) 
df.join(df.dropna(), how='left', lsuffix="_drop").loc[:,['value']].reset_index() 

  product       dates value                                                                                             
0      1A  2020-05-09    aa                                                                                             
1      1A  2020-05-09    aa                                                                                             
2      1A  2020-05-10    ab                                                                                             
3      2A  2020-05-09    bb                                                                                             
4      2A  2020-05-09    bb                                                                                             
5      2A  2020-05-10    bc   

Upvotes: 0

BENY
BENY

Reputation: 323376

Let us try transform

df.value.fillna(df.groupby(['dates','product'])['value'].transform('first'),inplace=True)
df
  product value       dates
0      1A    aa  2020-05-09
1      1A    aa  2020-05-09
2      1A    ab  2020-05-10
3      2A    bb  2020-05-09
4      2A    bb  2020-05-09
5      2A    bc  2020-05-10

Upvotes: 2

Related Questions