Query or filter pandas dataframe on multiple columns and cell values

Question

I've tried to figure this one out, but all methods end up with errors.

Quick background info: I have a dataframe with US states and info when a state has adopted a new law. In addition, the dataframe tells which are the neighbouring states. Here's a quick mockup:

(df: treated_states)

| year     | state |BC_law |n_ipo |Neighbor1 |Neighbor2 |Neighbor3 |Treated|
| -------- | ----- |-------|------|----------|----------|----------|-------|
| 1980     | AZ    |1999   |100   |CA        |AK        |WV        |1      |
| 1999     | AZ    |1999   |50    |CA        |AK        |WV        |1      |

(df: not_treated)

| year     | state |BC_law |n_ipo |Neighbor1 |Neighbor2 |Neighbor3 |Treated|
| -------- | ----- |-------|------|----------|----------|----------|-------|
| 1980     | WV    |0      |100   |CA        |AK        |WV        |0      |
| 1999     | AK    |0      |80    |CA        |AK        |WV        |0      |

In reality the dataframe includes all US states and all their neighbouring states in separate columns. There are data points before and and after the change in legislation.

I have filtered states that have adopted the new BC_law (df: treated_states) and those that haven't (df: not_treated). Now I should find out if the adoption of the new law affects the neighbouring states.

My problem: I should filter and store into a new dataframe the treated_states dataframes' Neighbor1, Neighbor2, Neighbor3 that exist in the dataframe not_treated

In case you wonder why I need to filter those: I'll do a difference-in-difference regression to determine if adopting the new law affects the neighboring states that don't adopt the BC_law.

Shubham Sharma · Accepted Answer

Here is a more concise approach:

Filter the Neighbour like columns
Create boolean mask with DataFrame.isin to check whether each element in dataframe is contained in state column of non_treated
Reduce the boolean mask along the columns axis with any

cols = treated_states.filter(like='Neigh')
mask = cols.isin(not_treated.state).any(axis=1)

>>> treated_states[mask]

   year state  BC_law  n_ipo Neighbor1 Neighbor2 Neighbor3  Treated
0  1980    AZ    1999    100        CA        AK        WV        1
1  1999    AZ    1999     50        CA        AK        WV        1

Query or filter pandas dataframe on multiple columns and cell values

Answers (2)

Related Questions