How do you remove values from a data frame based on whether they are present in another data frame?

Question

I have 2 data frames, the 1st contains a list of values I am looking to work with and the second contains these values plus a large number of other values. I am looking for the best way to remove the values that do not appear in the 1st data frame from the 2nddata frame to reduce the number of entries I am working with.

Example

Input

DF1

Alpha	code
A	1
D	2
E	3
F	4

DF2

Alpha	code
A	23
B	12
C	1
D	32
E	23
F	45
G	51
H	26

Desired Output:

DF1

Alpha	code
A	1
D	2
E	3
F	4

DF2

Alpha	code
A	23
D	32
E	23
F	45

sophocles · Accepted Answer

Assuming that your first column in DF1 is called "Alpha", you can do this:

my_list_DF1 = DF1['Alpha'].unique().tolist() # gets all unique values of first column from DF1 into a list

Then, you can filter your DF2, to include only those values, using isin:

new_DF2 = DF2[DF2['Alpha'].isin(my_list_DF1)]

Which will result in a smaller DF2, only including the common values from the so called 'Alpha' column.

How do you remove values from a data frame based on whether they are present in another data frame?

Answers (2)

Related Questions