Jason Bourne
Jason Bourne

Reputation: 359

Pandas replace string using values from list

I'm trying to replace string in one the columns inside my dataframe(df). Here's what df looks like:

                           0                  1
0  2012 Black Toyota Corolla    White/Black/Red
1      2013 Red Toyota Camry    Red
2      2015 Blue Honda Civic    Blue
3         2012 Black Mazda 6    Black/Red/White
4   2011 White Nissan Maxima    White/Red/Black

Sometimes, column 1 has multiple color values, sometimes only a single value. I would like to take however many values there are in column 1, check if any of those exist in column 0 and remove that value from column 0.

I've tried approaching it this way.

    def removeColor(main,sub):
         for i in sub.split('/'):
                 main = main.str.replace(i, '')
         return(main)
>>> df['0'] = df['0'].map(lambda x: removeColor(x['0'],x['2']))

This results in a TypeError.

TypeError: string indices must be integers

My expected output looks like below:

                     0                  1
0  2012 Toyota Corolla    White/Black/Red
1    2013 Toyota Camry    Red
2     2015 Honda Civic    Blue
3         2012 Mazda 6    Black/Red/White
4   2011 Nissan Maxima    White/Red/Black

Upvotes: 1

Views: 2245

Answers (2)

elvisytoob
elvisytoob

Reputation: 41

import pandas as pd

iLoc = pd.DataFrame({'0': ['2012 Black Toyota Corolla','2013 Red Toyota Camry','2015 Blue Honda Civic','2012 Black Mazda 6','2011 White Nissan Maxima'],'1': ['White/Black/Red','Red','Blue','Black/Red/White','White/Red/Black']})

display(iLoc)

def removeColor(main,sub):
    for i in range(len(main)):
        for j in str(sub[i]).split('/'):
            main[i] = main[i].replace(j, '').replace('  ',' ').strip()
    return main

iLoc["0"] = removeColor(iLoc["0"],iLoc["1"])

display(iLoc)

Your method was partially correct.
You need to extract value from the series and replace each main row with its substring row from same index

Upvotes: 1

aiguofer
aiguofer

Reputation: 2137

map only works on a Series. In your lambda function, x would be a String (the value for column "0"), so when you do x["0"] and x["1"] it's trying to get the index from a String, hence your error.

The apply function lets you act on an entire row (or column) and would be better suited. Here's one way to accomplish what you're after:

import re

def remove_color(row):
    return re.sub(row.iloc[1].replace("/", "|"), "", row.iloc[0]).replace("  ", " ")


df.iloc[:, 0] = df.apply(remove_color, axis=1)

You could replace the iloc calls with specific column names to make it more readable (you mentioned col names could be anything so I'm giving a generic approach here).

The second replace call is to remove extra spaces that were left by the re.sub. You could modify your re.sub to do that on a single call, but it could get messy.

Upvotes: 1

Related Questions