Filter rows in csv based on custom condition

Question

Say i have a csv like below

+-----+-----------+---------+
| ID  | state     | city    |
+-----+-----------+---------+
| 101 | READY     |         |
| 101 | DELIVERED | NEWYORK |
| 101 | DELIVERED | LONDON  |   
| 102 | READY     |         |
| 102 | DELIVERED | LONDON  |
| 103 | READY     |         |
| 103 | DELIVERED | NEWYORK |
| 104 | READY     |         |
| 104 | DELIVERED | TOKYO   |
| 104 | DELIVERED | PARIS   |
| 105 | DELIVERED | NEWYORK |
+-----+-----------+---------+

Now I want ID with State READY which has DELIVERED as NEWYORK.

Same ID will come multiple times with different state and city.
Always the city for READY will be empty
Always the city for DELIVERED will have some values.

so here first i want to check the value of column city for DELIVERED state. If it is NEWYORK then take that ID's READY row. If there is no READY row for that then we can ignore (ID 105 in this example)

EXPECTED OUTPUT

+-----+-----------+---------+
| ID  | state     | city    |
+-----+-----------+---------+
| 101 | READY     |         |
| 103 | READY     |         |
+-----+-----------+---------+

I have tried using self join in pandas. But i have no idea how to proceed further as i'm new to python. Currently i'm doing this in SQL.

import pandas as pd
mydata = pd.read_csv('C:/Mypython/Newyork',encoding = "ISO-8859-1")
NY = pd.merge(mydata,mydata,left_on='ID',right_on='ID',how='inner')

BallpointBen · Accepted Answer

Use the NEWYORK condition to get a list of IDs, then use that list to filter.

new_york_ids = df.loc[df['city']=='NEWYORK', 'ID']
df[(df['state']=='READY') & (df['ID'].isin(new_york_ids))]

    ID  state  city
0  101  READY  None
5  103  READY  None

Filter rows in csv based on custom condition

Answers (2)

Related Questions