Reputation: 167
I have an example dataframe:
col1 col2
0 Hello, is it me you're looking for Hello
1 Hello, is it me you're looking for me
2 Hello, is it me you're looking for looking
3 Hello, is it me you're looking for for
4 Hello, is it me you're looking for Lionel
5 Hello, is it me you're looking for Richie
I would like to change col1 so that it removed the string in col2, and return the ammended dataframe. I would also like to remove the characters 1 before and 1 after the string, for example, the desired output for index 1 would be:
col 1 col 2
1 Hello, is ityou're looking for me
I have tried using pd.apply()
, pd.map()
with a .replace()
function, but I can't get the .replace()
to use pd.['col2']
as an argument. I also feel as if it isn't the best way to go about it.
Any help? I'm mostly new to pandas and am looking to learn, so please ELI5.
Thanks!
Upvotes: 5
Views: 3531
Reputation: 1
Perhaps there is a more pythonic or elegant way, but here is how I quickly did above. This will work best if you don't have you need flexibility to manipulate the strings and where speed to fix is more important than performance.
I took out the columns of dataframe as two individual series
col1Series = df['col1']
col2Series = df['col2']
Next create an empty list to store final string value:
rowxList = []
Iterate as follows to populate the list:
for x,y in zip(col1Series,col2Series):
rowx = x.replace(y,'')
rowxList.append(rowx)
Last, put the rowxList back in the original dataframe as a new column. You can replace the old column. It's safer to do that under a new column and check the output against the original two columns and then remove the old column you no longer need:
df['newCol'] = rowxList
Upvotes: 0
Reputation: 833
Do some function for each row in dataframe can use:
df.apply(func, axis=1)
func will get each row as series as argument.
df['col1'] = df.apply(lambda row: row['col1'].replace(row['col2'],''))
However, removing one character before and after needs more work.
so define func:
def func(row):
c1 = row['col1'] #string col1
c2 = row['col2'] #string col2
find_index = c1.find(c2) #first find c2 index from left
if find_index == -1: # not find
return c1 #not change
else:
start_index = max(find_index - 1, 0) #1 before but not negative
end_index = find_index + len(c2) +1 #1 after, python will handle index overflow
return c1.replace(c1[start_index:end_index], '') #remove
then:
df['col1'] = df.apply(func, axis=1)
*to avoid copy warning, use:
df = df.assign(col1=df.apply(func, axis=1))
Upvotes: 4
Reputation: 2573
My guess is, that you were missing the "axis=1" so the apply works not on the column but on the row
A = """Hello, is it me you're looking for;Hello
Hello, is it me you're looking for;me
Hello, is it me you're looking for;looking
Hello, is it me you're looking for;for
Hello, is it me you're looking for;Lionel
Hello, is it me you're looking for;Richie
"""
df = pd.DataFrame([a.split(";") for a in A.split("\n") ][:-1],
columns=["col1","col2"])
df.col1 = df.apply( lambda x: x.col1.replace( x.col2, "" ) , axis=1)
Upvotes: 3