rye_bread
rye_bread

Reputation: 95

Filtering data frames based on multiple parameters at the same time

Lets say I have a data frame:

df = pd.DataFrame({"a": range(1,5), "b": range(6, 10), "c": range(11, 15) , "d": range(15, 19)})

I want to filter this data frame based on the values of two columns which make up coordinate points. Say c, d are the x, and y coordinates respectively. However, I want to check if given the list of points in the data frame, which points fall within the values of a list of x coordinates and a list of y coordinates.

x_coord = [4,12,13,17,19]
y_coord = [16,18,25,29,32]

Using the "isin" function of pandas, how can I parse both the c and d columns of the data frame simultaneously and check them against the values in a list? (I want to be able to use this parsing method for large data frames)

Output wanted: data frame containing the entire row of the original data frame that has both c & d values that are in both x & y lists.

Upvotes: 0

Views: 161

Answers (1)

Caio Belfort
Caio Belfort

Reputation: 555

You can do this by creating a new column as a tuple of the other two and using isin in that column as follows:

In[0]: df['coords'] = list(zip(df['c'], df['d']))
     : df[df['coords'].isin(zip(x_coord, y_coord))]

Out[0]:    
a  b   c   d     e
0  1  6  11  15  NaN
1  2  7  12  16  NaN
2  3  8  13  17  NaN
3  4  9  14  18  NaN

Or you can create a new dataframe with your coordinates and use a inner_join method to get only rows that match.

In[0]: df = pd.DataFrame({"a": range(1,5), "b": range(6, 10), "c": range(11, 15) , "d": range(15, 19), "e": np.nan})

     : x_coord = range(11, 15) 
     : y_coord = range(15, 19) 

     : coords = pd.DataFrame(list(zip(x_coord, y_coord)), columns=['c', 'd'])
     : df.merge(coords, on=['c', 'd'], how='inner')

Out[0]:    
    a  b   c   d     e
    0  1  6  11  15  NaN
    1  2  7  12  16  NaN
    2  3  8  13  17  NaN
    3  4  9  14  18  NaN

Upvotes: 1

Related Questions