Reputation: 359
I'm trying to replace string in one the columns inside my dataframe(df). Here's what df looks like:
0 1
0 2012 Black Toyota Corolla White/Black/Red
1 2013 Red Toyota Camry Red
2 2015 Blue Honda Civic Blue
3 2012 Black Mazda 6 Black/Red/White
4 2011 White Nissan Maxima White/Red/Black
Sometimes, column 1 has multiple color values, sometimes only a single value. I would like to take however many values there are in column 1, check if any of those exist in column 0 and remove that value from column 0.
I've tried approaching it this way.
def removeColor(main,sub):
for i in sub.split('/'):
main = main.str.replace(i, '')
return(main)
>>> df['0'] = df['0'].map(lambda x: removeColor(x['0'],x['2']))
This results in a TypeError.
TypeError: string indices must be integers
My expected output looks like below:
0 1
0 2012 Toyota Corolla White/Black/Red
1 2013 Toyota Camry Red
2 2015 Honda Civic Blue
3 2012 Mazda 6 Black/Red/White
4 2011 Nissan Maxima White/Red/Black
Upvotes: 1
Views: 2245
Reputation: 41
import pandas as pd
iLoc = pd.DataFrame({'0': ['2012 Black Toyota Corolla','2013 Red Toyota Camry','2015 Blue Honda Civic','2012 Black Mazda 6','2011 White Nissan Maxima'],'1': ['White/Black/Red','Red','Blue','Black/Red/White','White/Red/Black']})
display(iLoc)
def removeColor(main,sub):
for i in range(len(main)):
for j in str(sub[i]).split('/'):
main[i] = main[i].replace(j, '').replace(' ',' ').strip()
return main
iLoc["0"] = removeColor(iLoc["0"],iLoc["1"])
display(iLoc)
Your method was partially correct.
You need to extract value from the series and replace each main row with its substring row from same index
Upvotes: 1
Reputation: 2137
map
only works on a Series. In your lambda
function, x
would be a String (the value for column "0"), so when you do x["0"]
and x["1"]
it's trying to get the index from a String, hence your error.
The apply function lets you act on an entire row (or column) and would be better suited. Here's one way to accomplish what you're after:
import re
def remove_color(row):
return re.sub(row.iloc[1].replace("/", "|"), "", row.iloc[0]).replace(" ", " ")
df.iloc[:, 0] = df.apply(remove_color, axis=1)
You could replace the iloc
calls with specific column names to make it more readable (you mentioned col names could be anything so I'm giving a generic approach here).
The second replace
call is to remove extra spaces that were left by the re.sub
. You could modify your re.sub
to do that on a single call, but it could get messy.
Upvotes: 1