Reputation: 771
I have a subset of a dataframe that I need to backfill using grouping and conditions based on another column.
data = [
["1A", "aa", "2020-05-09"],
["1A", np.nan, "2020-05-09"],
["1A", "ab", "2020-05-10"],
["2A", "bb", "2020-05-09"],
["2A", np.nan, "2020-05-09"],
["2A", "bc", "2020-05-10"],
]
df = pd.DataFrame(data, columns=["product", "value", "dates"])
df
Essentially, for a subset of products I need to backfill the values based on the dates present. For example: For product
1A I need to fill in the value
with aa
because they both have the date 2020-05-09.
I also need to perform this on a subset of a dataset and have it join back in with the rest of the data once this operation is done.
I tried using a groupby with a lambda function (ffill or bfill) but that doesn't work since I really need to base the value fill off a condition.
Any help would be greatly appreciated?!
Upvotes: 1
Views: 198
Reputation: 24322
Try via groupby()
and ffill()
:
df['value']=df.groupby(['dates','product'])['value'].ffill()
Output of df
:
product value dates
0 1A aa 2020-05-09
1 1A aa 2020-05-09
2 1A ab 2020-05-10
3 2A bb 2020-05-09
4 2A bb 2020-05-09
5 2A bc 2020-05-10
Upvotes: 2
Reputation: 2174
You could just drop the na's, then join them with the raw df
df = df.set_index(['product','dates'])
df.join(df.dropna(), how='left', lsuffix="_drop").loc[:,['value']].reset_index()
product dates value
0 1A 2020-05-09 aa
1 1A 2020-05-09 aa
2 1A 2020-05-10 ab
3 2A 2020-05-09 bb
4 2A 2020-05-09 bb
5 2A 2020-05-10 bc
Upvotes: 0
Reputation: 323376
Let us try transform
df.value.fillna(df.groupby(['dates','product'])['value'].transform('first'),inplace=True)
df
product value dates
0 1A aa 2020-05-09
1 1A aa 2020-05-09
2 1A ab 2020-05-10
3 2A bb 2020-05-09
4 2A bb 2020-05-09
5 2A bc 2020-05-10
Upvotes: 2