I'm having a hell of a time trying to find an answer to this, which I feel like should be straight forward. What is the Pandas equivalent of SELECT * FROM [TABLE] WHERE [COL1] IN (LIST) ? Basically, I have a large dataset of ID numbers, and I want to select a certain subset of these customers. I want the entire dataframe's worth of values. I tried using .isin() , but I do not want boolean values. Is there a simple way to use another Dataframe column/Series/list/tuple (whatever) to select rows in another Dataframe? For example, I have two tables/DataFrames: Let's call the first table cust_info : cust_id metadata1 1234 y 5678 x 4321 x 2468 y 9513 x 1473 y The second table will be called cust_orders cust_id order_date order_total 1234 2021-07-10 15.10 5678 2021-07-11 480.52 4321 2021-07-11 72.71 2468 2021-07-12 50.96 9513 2021-07-12 69.20 1473 2021-07-13 75.58 I'd like to use customers who's metadata1 value == 'y' in cust_info . I've created a list (or Series, or Tuple, or Set, whatever I need) using DF1[DF1['metadata1'] == 'y'] and extracting the cust_id from that to create the list. In this case, I want the entire row (including order_date and order_total) with cust_id == [1234, 2468, 1473] So the SQL equivalent would be: SELECT * FROM [cust_orders] WHERE [cust_id] in ( SELECT * FROM [cust_info] where [metadata1] = 'y' ) I know I can slice using AND/OR and &/|, but I have hundreds of values so manually typing each ID is out of the question. How would you guys do this? I feel like I'm missing something super obvious... Thanks!

Reputation: 123

Pandas Equivalent of SQL's "Where In" with lists/tuples/iterables

I'm having a hell of a time trying to find an answer to this, which I feel like should be straight forward.

What is the Pandas equivalent of SELECT * FROM [TABLE] WHERE [COL1] IN (LIST)?

Basically, I have a large dataset of ID numbers, and I want to select a certain subset of these customers. I want the entire dataframe's worth of values. I tried using .isin(), but I do not want boolean values. Is there a simple way to use another Dataframe column/Series/list/tuple (whatever) to select rows in another Dataframe?

For example, I have two tables/DataFrames:

Let's call the first table cust_info:

cust_id	metadata1
1234	y
5678	x
4321	x
2468	y
9513	x
1473	y

The second table will be called cust_orders

cust_id	order_date	order_total
1234	2021-07-10	15.10
5678	2021-07-11	480.52
4321	2021-07-11	72.71
2468	2021-07-12	50.96
9513	2021-07-12	69.20
1473	2021-07-13	75.58

I'd like to use customers who's metadata1 value == 'y' in cust_info. I've created a list (or Series, or Tuple, or Set, whatever I need) using DF1[DF1['metadata1'] == 'y'] and extracting the cust_id from that to create the list.

In this case, I want the entire row (including order_date and order_total) with cust_id == [1234, 2468, 1473]

So the SQL equivalent would be:

SELECT * FROM [cust_orders]
WHERE [cust_id] in 
(
SELECT * FROM [cust_info] where [metadata1] = 'y'
)

I know I can slice using AND/OR and &/|, but I have hundreds of values so manually typing each ID is out of the question. How would you guys do this? I feel like I'm missing something super obvious...

Thanks!

Upvotes: 1

Answers (2)

ifly6

Reputation: 5331

Make a list of the customer IDs where cust_info['metadata1'] == 'y':

valids = cust_info.loc[cust_info['metadata1'] == 'y', 'cust_id']

Then ask whether the ID is in that list of valid IDs:

cust_orders[cust_orders['custid'].isin(valids)]

Upvotes: 3

Andrej Kesely

Reputation: 195418

Try:

x = cust_info[cust_info["metadata1"].eq("y")].merge(cust_orders, on="cust_id")
print(x)

Prints:

   cust_id metadata1  order_date  order_total
0     1234         y  2021-07-10        15.10
1     2468         y  2021-07-12        50.96
2     1473         y  2021-07-13        75.58

If you want only columns from cust_orders:

x = cust_info[cust_info["metadata1"].eq("y")].merge(cust_orders, on="cust_id")
print(x[cust_orders.columns])

   cust_id  order_date  order_total
0     1234  2021-07-10        15.10
1     2468  2021-07-12        50.96
2     1473  2021-07-13        75.58

Upvotes: 1

Pandas Equivalent of SQL&#39;s &quot;Where In&quot; with lists/tuples/iterables

Answers (2)

Related Questions

Pandas Equivalent of SQL's "Where In" with lists/tuples/iterables