Avinash
Avinash

Reputation: 553

Filter rows in csv based on custom condition

Say i have a csv like below

+-----+-----------+---------+
| ID  | state     | city    |
+-----+-----------+---------+
| 101 | READY     |         |
| 101 | DELIVERED | NEWYORK |
| 101 | DELIVERED | LONDON  |   
| 102 | READY     |         |
| 102 | DELIVERED | LONDON  |
| 103 | READY     |         |
| 103 | DELIVERED | NEWYORK |
| 104 | READY     |         |
| 104 | DELIVERED | TOKYO   |
| 104 | DELIVERED | PARIS   |
| 105 | DELIVERED | NEWYORK |
+-----+-----------+---------+

Now I want ID with State READY which has DELIVERED as NEWYORK.

so here first i want to check the value of column city for DELIVERED state. If it is NEWYORK then take that ID's READY row. If there is no READY row for that then we can ignore (ID 105 in this example)

EXPECTED OUTPUT

+-----+-----------+---------+
| ID  | state     | city    |
+-----+-----------+---------+
| 101 | READY     |         |
| 103 | READY     |         |
+-----+-----------+---------+

I have tried using self join in pandas. But i have no idea how to proceed further as i'm new to python. Currently i'm doing this in SQL.

import pandas as pd
mydata = pd.read_csv('C:/Mypython/Newyork',encoding = "ISO-8859-1")
NY = pd.merge(mydata,mydata,left_on='ID',right_on='ID',how='inner')

Upvotes: 2

Views: 207

Answers (2)

BallpointBen
BallpointBen

Reputation: 13790

Use the NEWYORK condition to get a list of IDs, then use that list to filter.

new_york_ids = df.loc[df['city']=='NEWYORK', 'ID']
df[(df['state']=='READY') & (df['ID'].isin(new_york_ids))]

    ID  state  city
0  101  READY  None
5  103  READY  None

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150785

Let's try groupby().transform() to identify those with NEWYORK and then boolean indexing:

has_NY = df['city'].eq('NEWYORK').groupby(df['ID']).transform('any')

mask = df['state'].eq('READY') & has_NY

df[mask]

Output:

    ID  state  city
0  101  READY  None
5  103  READY  None

Upvotes: 3

Related Questions