Reputation: 1717
I got a dataframe as follows:
flight_id | from_location | to_location | schedule |
1 | Vancouver | Toronto | 3-Jan |
2 | Amsterdam | Tokyo | 15-Feb |
4 | None | Glasgow | 12-Jan |
9 | Halmstad | Athens | 21-Jan |
3 | Brisbane | None | 4-Feb |
4 | Johannesburg | Venice | 12-Jan |
9 | None | None | 3-Mar |
I want to get only those record index values where atleast one of the columns from_location
,to_location
is None
. Also the record index should be tupled with column name that qualified it for selection(i.e the column name whose value is None
in that record.
Output for above example should be like:
[(2,from_location),(4,to_location),(6,from_location),(6,to_location)]
1st element in tuple: record index
2nd element in tuple: column_name in that record index whose value is None
I did this:- df[df[['flight_id','to_location']].isnull().any(axis=1)].index.values.tolist()
But, that gives me only the record index values [2,4,6]
. I also want corresponding column names as described above
Any ideas folks?
Upvotes: 0
Views: 49
Reputation: 153460
Let's use np.where
:
x,y = np.where(df == 'None')
# if those 'None' are np.nan
# use x,y = np.where(df.isnull())
list(tuple(zip(df.index[x],df.columns[y])))
Output (if flight_id is in the index):
[(4, 'from_location'),
(3, 'to_location'),
(9, 'from_location'),
(9, 'to_location')]
Else if flight_id is not in the index:
[(2, 'from_location'),
(4, 'to_location'),
(6, 'from_location'),
(6, 'to_location')]
Upvotes: 1
Reputation: 323226
I think your sample output is wrong base on your description
s=df.set_index('flight_id').stack(dropna=False)
s[s=='None'].index.tolist()#s[s.isnull()]
Out[625]:
[(4, 'from_location'),
(3, 'to_location'),
(9, 'from_location'),
(9, 'to_location')]
Upvotes: 1