Panda
Panda

Reputation: 57

Pandas DF Removal of Duplicate Surnames

i have DataFrame that has Name of People and Some Names Incorrect Caught surname due to selenium scraping so i want to remove them

Input:

            TEXT    TYPE
0  Barrack Obama  PERSON
1          Obama  PERSON
2      Don Beyer  PERSON
3    Doug Wilson  PERSON
4         Wilson  PERSON
5         Thomas  PERSON

Expected Output

            TEXT    TYPE
0  Barrack Obama  PERSON
1      Don Beyer  PERSON
2    Doug Wilson  PERSON
3         Thomas  PERSON

Upvotes: 2

Views: 94

Answers (2)

anky
anky

Reputation: 75110

Here is another approach using duplicated()

df[~df['TEXT'].str.split().str[-1].duplicated()]

Or:

df[~df['TEXT'].str.split(expand=True).ffill(1).iloc[:,-1].duplicated()]

Or:

df[~df['TEXT'].str.split(expand=True).ffill(1).duplicated([1])]

            TEXT    TYPE
0  Barrack Obama  PERSON
2      Don Beyer  PERSON
3    Doug Wilson  PERSON
5         Thomas  PERSON

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150785

Without your data in text, I won't test the following, which should work:

df.groupby(df.TEXT.str.extract('(\w*)$')[0],
           sort=False, as_index=False
          ).first()

Output:

            TEXT    TYPE
0  Barrack Obama  PERSON
1      Don Beyer  PERSON
2    Doug Wilson  PERSON
3         Thomas  PERSON

Upvotes: 3

Related Questions