Reputation: 722
I have a dataset that looks like the following:
week. country. product_number. product_size. selling_price
0 20-2020. BE. 101.0 2.0 25.0
1 20-2020. BE. 101.0 4.0 50.0
2 20-2020. BE. 102.0 2.0 NaN
3 20-2020. BE. 102.0 4.0 NaN
4 20-2020. BE. 103.0 2.0 NaN
5 20-2020. BE. 103.0 4.0 NaN
6 20-2020. BE. 110.0 4.0 10.0
7 20-2020. BE. 111.0 4.0 12.0
What I'm missing are product numbers 102 and 103. I want to fill their price using the price from product 101.
Meaning for product_number (102) on a given week(20-2020), country(BE), and product size(2) I want to fill the selling_price using the equivalent data from product_number(101). Which means that the selling price should now be 25. The same rule should apply to product number 103 as well.
The above table after transformation should like this:
week. country. product_number. product_size. selling_price
0 20-2020. BE. 101.0 2.0 25
1 20-2020. BE. 101.0 4.0 50
2 20-2020. BE. 102.0 2.0 25
3 20-2020. BE. 102.0 4.0 50
4 20-2020. BE. 103.0 2.0 25
5 20-2020. BE. 103.0 4.0 50
6 20-2020. BE. 110.0 4.0 10
7 20-2020. BE. 111.0 4.0 12
My approach was to loop around the table, and continuously index what i was looking for and replacing the values but I get some errors. Then I tried, subsetting the dataframe, changing the values and then reassigning it.
I've been also thinking about np.where
but I can't figure out how to pass said condition.
Upvotes: 1
Views: 79
Reputation: 1555
For me, usually the simplest solution is the most effective. I won't do any ninja pandas stuff here, just basic loops and dictionaries.
my_dict = dict()
for index in df[(df['product_number'] == 101)].index:
key = (df.loc[index, "product_size"], df.loc[index, "country"], df.loc[index, "week"])
my_dict[key] = df.loc[index, "selling_price"]
for index in df[(df['product_number'] == 102) | (df['product_number'] == 103)].index:
unlock = (df.loc[index, "product_size"], df.loc[index, "country"], df.loc[index, "week_hf"])
df.loc[index, "selling_price"] = my_dict[unlock]
Upvotes: 1
Reputation: 71689
Slice the dataframe where product number is 101
and select the column selling_price
after setting the index of this dataframe to c
to create a mapping series m
, finally use this mapping series along with Series.map
to fill the nan values in selling_price
column:
c = ['week.', 'country.', 'product_size.']
m = df[df['product_number.'].eq(101)].set_index(c)['selling_price']
df['selling_price'] = df['selling_price'].fillna(
pd.Series(df.set_index(c).index.map(m), index=df.index))
week. country. product_number. product_size. selling_price
0 20-2020. BE. 101.0 2.0 25.0
1 20-2020. BE. 101.0 4.0 50.0
2 20-2020. BE. 102.0 2.0 25.0
3 20-2020. BE. 102.0 4.0 50.0
4 20-2020. BE. 103.0 2.0 25.0
5 20-2020. BE. 103.0 4.0 50.0
6 20-2020. BE. 110.0 4.0 10.0
7 20-2020. BE. 111.0 4.0 12.0
Upvotes: 1
Reputation: 151
You could use the fillna method on the DataFrame and specify the method as ffill (forward fill): test this line of command on your jupyter notebook if you are working with :
for v in df[(df['product_number'] == 102) & (df['product_number'] == 103)].values:
df.fillna(method='ffill')
Upvotes: 0