Reputation: 5117
I have this in pandas
and python
:
text1 text2
0 sunny This is a sunny day
1 rainy day No this day is a rainy day
and I want to transform it to this:
text1 text2
0 sunny This is a day
1 rainy day No this day is a
Therefore, I want to remove some text from text2
based on text1
of the same row.
I did this:
df = df.apply(lambda x: x['text2'].str.replace(x['text1'], ''))
but I was getting an error:
AttributeError: ("'str' object has no attribute 'str'", 'occurred at index 0')
which maybe related to this: https://stackoverflow.com/a/53986135/9024698.
What is the most efficient way to do what I want to do?
Upvotes: 1
Views: 661
Reputation: 863611
Fast a bit ugly solution is replace
- but possible multiple whitespaces if need replace per rows by another column:
df['text2'] = df.apply(lambda x: x['text2'].replace(x['text1'], ''), axis=1)
print (df)
text1 text2
0 sunny This is a day
1 rainy day No this day is a
Solution with split both columns:
df['text2'] = df.apply(lambda x: ' '.join(y for y in x['text2'].split()
if y not in set(x['text1'].split())), axis=1)
If need replace by all values of another column better is use solution by @Erfan:
df['text2'].str.replace('|'.join(df['text1']), '')
Upvotes: 4
Reputation: 125
Simply use the replace method :
df["text2"]=df["text2"].replace(to_replace=df["text1"],value="",regex=True)
EDIT:
As metioned by @jezrael, this method does not take into account surounding spaces (as they are not matched by the regex). However you can tune the regex to avoid some of them adding optional spaces to the pattern for example :
df["text2"]=df["text2"].replace(to_replace=df["text1"]+" *",value="",regex=True)
Upvotes: 0
Reputation: 5500
This is because your applying your function over column instead of row. Also, x['text2']
is already a string so no need to call .str
. With these modifications, you will have:
print(df.apply(lambda x: x['text2'].replace(x['text1'], ''), axis=1))
# 0 This is a day
# 1 No this day is a
As you can see, you only return the text2
column.
Here is one example returning the whole dataframe processed:
# Import module
import pandas as pd
df = pd.DataFrame({"text1": ["sunny", "rainy day"],
"text2": ["This is a sunny day", "No this day is a rainy day"]})
print(df)
# text1 text2
# 0 sunny This is a sunny day
# 1 rainy day No this day is a rainy day
# Function to apply
def remove_word(row):
row['text2'] = row.text2.replace(row['text1'], '')
return row
# Apply the function on each row (axis = 1)
df = df.apply(remove_word, axis=1)
print(df)
# text1 text2
# 0 sunny This is a day
# 1 rainy day No this day is a
Upvotes: 0