CTD91
CTD91

Reputation: 133

How do you remove values from a data frame based on whether they are present in another data frame?

I have 2 data frames, the 1st contains a list of values I am looking to work with and the second contains these values plus a large number of other values. I am looking for the best way to remove the values that do not appear in the 1st data frame from the 2nddata frame to reduce the number of entries I am working with.

Example

Input

DF1

Alpha code
A 1
D 2
E 3
F 4

DF2

Alpha code
A 23
B 12
C 1
D 32
E 23
F 45
G 51
H 26

Desired Output:

DF1

Alpha code
A 1
D 2
E 3
F 4

DF2

Alpha code
A 23
D 32
E 23
F 45

Upvotes: 2

Views: 47

Answers (2)

Bonqus
Bonqus

Reputation: 506

You could do an inner join, dropping all rows that doesn't have an entry and merging all others:

pd.merge(DF1, DF2, on='Alpha', how='inner')

But then you would subsequently have to drop the columns you dont need, and posibly rename if some share a name.

Upvotes: 0

sophocles
sophocles

Reputation: 13821

Assuming that your first column in DF1 is called "Alpha", you can do this:

my_list_DF1 = DF1['Alpha'].unique().tolist() # gets all unique values of first column from DF1 into a list

Then, you can filter your DF2, to include only those values, using isin:

new_DF2 = DF2[DF2['Alpha'].isin(my_list_DF1)]

Which will result in a smaller DF2, only including the common values from the so called 'Alpha' column.

Upvotes: 1

Related Questions