Vidya
Vidya

Reputation: 17

Pandas - Update column with numbers / string extracted by using regex only on rows retrieved based on specific conditions

I am trying to clean data from a file. I have done a partial clean and the data looks like this.

enter image description here

The Price column still needs to be cleaned and updated into other columns. This is what I want to do

Str '80 per piece' =>

80 -> 'Price' column
'piece' -> 'Unit' column

Str '110 per pack' =>
110 -> 'Price' column
'pack' -> 'Unit' column

I created a mask to retrieve the rows I need and then used regex to extract non-digits. I find that it affects all the rows. When I try to use only the rows retrieved by the mask - I get an error.

How to ensure only the column in the conditionally retrieved rows is affected ?

This is my code - Incorrect output without using mask on both sides.

enter image description here

But if I try this using the mask - I get this error

enter image description here

Upvotes: 0

Views: 426

Answers (1)

Henry Yik
Henry Yik

Reputation: 22503

IIUC you can extract with named groups and then update:

df = pd.DataFrame({"Unit":["gm", np.NaN, np.NaN],
                   "Price":["40","80 per piece", "110 per pack"]})

  Unit         Price
0   gm            40
1  NaN  80 per piece
2  NaN  110 per pack

s = df.loc[df["Unit"].isnull(),"Price"].str.extract("(?P<Price>\d+)\sper\s(?P<Unit>[A-Za-z]+)").dropna()

df.update(s)

print (df)

    Unit Price
0     gm    40
1  piece    80
2   pack   110

Upvotes: 1

Related Questions