Reputation: 722

Conditional data ingestion of nan values pandas

I have a dataset that looks like the following:

      week. country.  product_number.  product_size.  selling_price
0  20-2020.      BE.            101.0            2.0           25.0
1  20-2020.      BE.            101.0            4.0           50.0
2  20-2020.      BE.            102.0            2.0            NaN
3  20-2020.      BE.            102.0            4.0            NaN
4  20-2020.      BE.            103.0            2.0            NaN
5  20-2020.      BE.            103.0            4.0            NaN
6  20-2020.      BE.            110.0            4.0           10.0
7  20-2020.      BE.            111.0            4.0           12.0

What I'm missing are product numbers 102 and 103. I want to fill their price using the price from product 101.

Meaning for product_number (102) on a given week(20-2020), country(BE), and product size(2) I want to fill the selling_price using the equivalent data from product_number(101). Which means that the selling price should now be 25. The same rule should apply to product number 103 as well.

The above table after transformation should like this:

      week. country.  product_number.  product_size.  selling_price
0  20-2020.      BE.            101.0            2.0             25
1  20-2020.      BE.            101.0            4.0             50
2  20-2020.      BE.            102.0            2.0             25
3  20-2020.      BE.            102.0            4.0             50
4  20-2020.      BE.            103.0            2.0             25
5  20-2020.      BE.            103.0            4.0             50
6  20-2020.      BE.            110.0            4.0             10
7  20-2020.      BE.            111.0            4.0             12

My approach was to loop around the table, and continuously index what i was looking for and replacing the values but I get some errors. Then I tried, subsetting the dataframe, changing the values and then reassigning it.

I've been also thinking about np.where but I can't figure out how to pass said condition.

Upvotes: 1

Answers (3)

Pavlos Panteliadis

Reputation: 1565

For me, usually the simplest solution is the most effective. I won't do any ninja pandas stuff here, just basic loops and dictionaries.

my_dict = dict()
for index in df[(df['product_number'] == 101)].index:
    key = (df.loc[index, "product_size"], df.loc[index, "country"], df.loc[index, "week"])
        my_dict[key] = df.loc[index, "selling_price"]

for index in df[(df['product_number'] == 102) | (df['product_number'] == 103)].index:
    unlock = (df.loc[index, "product_size"], df.loc[index, "country"], df.loc[index, "week_hf"])
        df.loc[index, "selling_price"] = my_dict[unlock]

Upvotes: 1

Shubham Sharma

Reputation: 71707

Slice the dataframe where product number is 101 and select the column selling_price after setting the index of this dataframe to c to create a mapping series m, finally use this mapping series along with Series.map to fill the nan values in selling_price column:

c = ['week.', 'country.', 'product_size.']
m = df[df['product_number.'].eq(101)].set_index(c)['selling_price']
df['selling_price'] = df['selling_price'].fillna(
                      pd.Series(df.set_index(c).index.map(m), index=df.index))

      week. country.  product_number.  product_size.  selling_price
0  20-2020.      BE.            101.0            2.0           25.0
1  20-2020.      BE.            101.0            4.0           50.0
2  20-2020.      BE.            102.0            2.0           25.0
3  20-2020.      BE.            102.0            4.0           50.0
4  20-2020.      BE.            103.0            2.0           25.0
5  20-2020.      BE.            103.0            4.0           50.0
6  20-2020.      BE.            110.0            4.0           10.0
7  20-2020.      BE.            111.0            4.0           12.0

Upvotes: 1

BADS

Reputation: 151

You could use the fillna method on the DataFrame and specify the method as ffill (forward fill): test this line of command on your jupyter notebook if you are working with :

for v in df[(df['product_number'] == 102) & (df['product_number'] == 103)].values:
    df.fillna(method='ffill')

Upvotes: 0

Conditional data ingestion of nan values pandas

Answers (3)

Related Questions