Reputation: 1717

get records & column names from dataframe where atleast one column value of a record is True

I got a dataframe as follows:

flight_id | from_location  | to_location |  schedule |  
1         |   Vancouver    |   Toronto   |   3-Jan   |  
2         |   Amsterdam    |   Tokyo     |   15-Feb  |  
4         |   None         |   Glasgow   |   12-Jan  |  
9         |   Halmstad     |   Athens    |   21-Jan  |  
3         |   Brisbane     |   None      |   4-Feb   |  
4         |   Johannesburg |   Venice    |   12-Jan  |
9         |   None         |  None       |   3-Mar   |

I want to get only those record index values where atleast one of the columns from_location ,to_location is None. Also the record index should be tupled with column name that qualified it for selection(i.e the column name whose value is None in that record.

Output for above example should be like: [(2,from_location),(4,to_location),(6,from_location),(6,to_location)]

1st element in tuple: record index
2nd element in tuple: column_name in that record index whose value is None

I did this:- df[df[['flight_id','to_location']].isnull().any(axis=1)].index.values.tolist()
But, that gives me only the record index values [2,4,6]. I also want corresponding column names as described above

Any ideas folks?

Upvotes: 0

Answers (2)

Scott Boston

Reputation: 153460

Let's use np.where:

x,y = np.where(df == 'None') 
# if those 'None' are np.nan
# use x,y = np.where(df.isnull()) 
list(tuple(zip(df.index[x],df.columns[y])))

Output (if flight_id is in the index):

[(4, 'from_location'),
 (3, 'to_location'),
 (9, 'from_location'),
 (9, 'to_location')]

Else if flight_id is not in the index:

[(2, 'from_location'),
 (4, 'to_location'),
 (6, 'from_location'),
 (6, 'to_location')]

Upvotes: 1

BENY

Reputation: 323226

I think your sample output is wrong base on your description

s=df.set_index('flight_id').stack(dropna=False)    
s[s=='None'].index.tolist()#s[s.isnull()]
Out[625]: 
[(4, 'from_location'),
 (3, 'to_location'),
 (9, 'from_location'),
 (9, 'to_location')]

Upvotes: 1

get records &amp; column names from dataframe where atleast one column value of a record is True

Answers (2)

Related Questions

get records & column names from dataframe where atleast one column value of a record is True