Reputation: 553
Say i have a csv like below
+-----+-----------+---------+
| ID | state | city |
+-----+-----------+---------+
| 101 | READY | |
| 101 | DELIVERED | NEWYORK |
| 101 | DELIVERED | LONDON |
| 102 | READY | |
| 102 | DELIVERED | LONDON |
| 103 | READY | |
| 103 | DELIVERED | NEWYORK |
| 104 | READY | |
| 104 | DELIVERED | TOKYO |
| 104 | DELIVERED | PARIS |
| 105 | DELIVERED | NEWYORK |
+-----+-----------+---------+
Now I want ID with State READY
which has DELIVERED
as NEWYORK
.
city
for READY will be emptycity
for DELIVERED will have some values.so here first i want to check the value of column city
for DELIVERED state
. If it is NEWYORK then take that ID's READY row. If there is no READY
row for that then we can ignore (ID 105 in this example)
EXPECTED OUTPUT
+-----+-----------+---------+
| ID | state | city |
+-----+-----------+---------+
| 101 | READY | |
| 103 | READY | |
+-----+-----------+---------+
I have tried using self join in pandas. But i have no idea how to proceed further as i'm new to python. Currently i'm doing this in SQL.
import pandas as pd
mydata = pd.read_csv('C:/Mypython/Newyork',encoding = "ISO-8859-1")
NY = pd.merge(mydata,mydata,left_on='ID',right_on='ID',how='inner')
Upvotes: 2
Views: 207
Reputation: 13790
Use the NEWYORK
condition to get a list of IDs, then use that list to filter.
new_york_ids = df.loc[df['city']=='NEWYORK', 'ID']
df[(df['state']=='READY') & (df['ID'].isin(new_york_ids))]
ID state city
0 101 READY None
5 103 READY None
Upvotes: 2
Reputation: 150785
Let's try groupby().transform()
to identify those with NEWYORK
and then boolean indexing:
has_NY = df['city'].eq('NEWYORK').groupby(df['ID']).transform('any')
mask = df['state'].eq('READY') & has_NY
df[mask]
Output:
ID state city
0 101 READY None
5 103 READY None
Upvotes: 3