Reputation: 57
i have DataFrame that has Name of People and Some Names Incorrect Caught surname due to selenium scraping so i want to remove them
Input:
TEXT TYPE
0 Barrack Obama PERSON
1 Obama PERSON
2 Don Beyer PERSON
3 Doug Wilson PERSON
4 Wilson PERSON
5 Thomas PERSON
Expected Output
TEXT TYPE
0 Barrack Obama PERSON
1 Don Beyer PERSON
2 Doug Wilson PERSON
3 Thomas PERSON
Upvotes: 2
Views: 94
Reputation: 75110
Here is another approach using duplicated()
df[~df['TEXT'].str.split().str[-1].duplicated()]
Or:
df[~df['TEXT'].str.split(expand=True).ffill(1).iloc[:,-1].duplicated()]
Or:
df[~df['TEXT'].str.split(expand=True).ffill(1).duplicated([1])]
TEXT TYPE
0 Barrack Obama PERSON
2 Don Beyer PERSON
3 Doug Wilson PERSON
5 Thomas PERSON
Upvotes: 3
Reputation: 150785
Without your data in text, I won't test the following, which should work:
df.groupby(df.TEXT.str.extract('(\w*)$')[0],
sort=False, as_index=False
).first()
Output:
TEXT TYPE
0 Barrack Obama PERSON
1 Don Beyer PERSON
2 Doug Wilson PERSON
3 Thomas PERSON
Upvotes: 3