user15468984
user15468984

Reputation:

How to remove duplicate "id" column values in python

I have several dataframes which I want to merge into only one big dataframe to build a classifier.

This is the base dataframe, user_df_copy

In this dataframe, there is the id column which indicates the client id. I have other dataframes like this one, which have columns related to user_id column.

So, the goal is to merge these small dataframes into the user_df_copy, adding columns like subject_id and to have values only if the user_id matches to the main df id, otherwise, NaN. Problem is, in these small dataframes, the id's appear duplicated.

I also applied get_dummies to the subject_id column like this.

Upvotes: 0

Views: 1731

Answers (1)

distracted-biologist
distracted-biologist

Reputation: 808

If you want to just drop duplicate rows in the smaller DataFrames you can use:

df.drop_duplicates(subset="id")

Upvotes: 1

Related Questions