Reputation: 133
I have 2 data frames, the 1st contains a list of values I am looking to work with and the second contains these values plus a large number of other values. I am looking for the best way to remove the values that do not appear in the 1st data frame from the 2nddata frame to reduce the number of entries I am working with.
Example
Input
DF1
Alpha | code |
---|---|
A | 1 |
D | 2 |
E | 3 |
F | 4 |
DF2
Alpha | code |
---|---|
A | 23 |
B | 12 |
C | 1 |
D | 32 |
E | 23 |
F | 45 |
G | 51 |
H | 26 |
Desired Output:
DF1
Alpha | code |
---|---|
A | 1 |
D | 2 |
E | 3 |
F | 4 |
DF2
Alpha | code |
---|---|
A | 23 |
D | 32 |
E | 23 |
F | 45 |
Upvotes: 2
Views: 47
Reputation: 506
You could do an inner join, dropping all rows that doesn't have an entry and merging all others:
pd.merge(DF1, DF2, on='Alpha', how='inner')
But then you would subsequently have to drop the columns you dont need, and posibly rename if some share a name.
Upvotes: 0
Reputation: 13821
Assuming that your first column in DF1 is called "Alpha", you can do this:
my_list_DF1 = DF1['Alpha'].unique().tolist() # gets all unique values of first column from DF1 into a list
Then, you can filter
your DF2, to include only those values, using isin
:
new_DF2 = DF2[DF2['Alpha'].isin(my_list_DF1)]
Which will result in a smaller DF2, only including the common values from the so called 'Alpha' column.
Upvotes: 1