Reputation: 17
I am trying to clean data from a file. I have done a partial clean and the data looks like this.
The Price
column still needs to be cleaned and updated into other columns. This is what I want to do
Str '80 per piece' =>
80 -> 'Price' column
'piece' -> 'Unit' column
Str '110 per pack' =>
110 -> 'Price' column
'pack' -> 'Unit' column
I created a mask to retrieve the rows I need and then used regex to extract non-digits. I find that it affects all the rows. When I try to use only the rows retrieved by the mask - I get an error.
How to ensure only the column in the conditionally retrieved rows is affected ?
This is my code - Incorrect output without using mask on both sides.
But if I try this using the mask - I get this error
Upvotes: 0
Views: 426
Reputation: 22503
IIUC you can extract
with named groups and then update
:
df = pd.DataFrame({"Unit":["gm", np.NaN, np.NaN],
"Price":["40","80 per piece", "110 per pack"]})
Unit Price
0 gm 40
1 NaN 80 per piece
2 NaN 110 per pack
s = df.loc[df["Unit"].isnull(),"Price"].str.extract("(?P<Price>\d+)\sper\s(?P<Unit>[A-Za-z]+)").dropna()
df.update(s)
print (df)
Unit Price
0 gm 40
1 piece 80
2 pack 110
Upvotes: 1