Alex Günsberg
Alex Günsberg

Reputation: 188

Query or filter pandas dataframe on multiple columns and cell values

I've tried to figure this one out, but all methods end up with errors.

Quick background info: I have a dataframe with US states and info when a state has adopted a new law. In addition, the dataframe tells which are the neighbouring states. Here's a quick mockup:

(df: treated_states)

| year     | state |BC_law |n_ipo |Neighbor1 |Neighbor2 |Neighbor3 |Treated|
| -------- | ----- |-------|------|----------|----------|----------|-------|
| 1980     | AZ    |1999   |100   |CA        |AK        |WV        |1      |
| 1999     | AZ    |1999   |50    |CA        |AK        |WV        |1      |

(df: not_treated)

| year     | state |BC_law |n_ipo |Neighbor1 |Neighbor2 |Neighbor3 |Treated|
| -------- | ----- |-------|------|----------|----------|----------|-------|
| 1980     | WV    |0      |100   |CA        |AK        |WV        |0      |
| 1999     | AK    |0      |80    |CA        |AK        |WV        |0      |

In reality the dataframe includes all US states and all their neighbouring states in separate columns. There are data points before and and after the change in legislation.

I have filtered states that have adopted the new BC_law (df: treated_states) and those that haven't (df: not_treated). Now I should find out if the adoption of the new law affects the neighbouring states.

My problem: I should filter and store into a new dataframe the treated_states dataframes' Neighbor1, Neighbor2, Neighbor3 that exist in the dataframe not_treated

In case you wonder why I need to filter those: I'll do a difference-in-difference regression to determine if adopting the new law affects the neighboring states that don't adopt the BC_law.

Upvotes: 2

Views: 986

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Here is a more concise approach:

  • Filter the Neighbour like columns
  • Create boolean mask with DataFrame.isin to check whether each element in dataframe is contained in state column of non_treated
  • Reduce the boolean mask along the columns axis with any
cols = treated_states.filter(like='Neigh')
mask = cols.isin(not_treated.state).any(axis=1)

>>> treated_states[mask]

   year state  BC_law  n_ipo Neighbor1 Neighbor2 Neighbor3  Treated
0  1980    AZ    1999    100        CA        AK        WV        1
1  1999    AZ    1999     50        CA        AK        WV        1

Upvotes: 1

Alex Günsberg
Alex Günsberg

Reputation: 188

I managed to write this and it seems to work... Maybe it helps someone and a better coder can suggest a more efficient and elegant way to solve this

#creating a list of states that have not adopted the new legislation
not_treated_list = not_treated.STATE

#df with states that have not adopted the new legislation, that are also neighbouring states that have adopted the new legislation 
neighbour = treated_states[(treated_states.Neigh1.isin(not_treated_list)) | (treated_states.Neigh2.isin(not_treated_list)) |(treated_states.Neigh3.isin(not_treated_list)) | (treated_states.Neigh4.isin(not_treated_list)) | (treated_states.Neigh5.isin(not_treated_list)) | (treated_states.Neigh6.isin(not_treated_list)) | (treated_states.Neigh7.isin(not_treated_list)) | (treated_states.Neigh8.isin(not_treated_list)) ]

neighbour

Upvotes: 0

Related Questions