Alexey7
Alexey7

Reputation: 53

Getting listed column names of all not nan rows

I've pandas dataframe based on pivot table with index and columns. Index are presented with values that are not nan at least in one column, while others are nans.

          col_1  col_2  col_3  col_4 ...  col_100
index_1     1      2      nan   nan  ...     5 
index_2    nan    nan      1     1   ...     10
...        ...    ...     ...   ...  ...     ...
index_100  nan     9       4    ...  ...     nan

How can I get column names of all the not nan values in a row and put them into automatically suffixed list names by each index? Need to get this:

list_1=[col_1, col_2, col_100]
list_2=[col_3, col_4, col_100]
list_100=[col_2, col_3]

Upvotes: 4

Views: 1508

Answers (2)

Yeghishe Kerobyan
Yeghishe Kerobyan

Reputation: 149

Generate sample data

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(100,100), columns=['col_' + str(i) for i in 
    range(1,101)], index=['index_' + str(i) for i in range(1, 101)])
for i in range(len(df)):
    df.iloc[i, np.random.randint(0, 100, size=90).astype(int)] = np.nan

df
Out[45]: 
              col_1     col_2     col_3  ...    col_98    col_99   col_100
index_1         NaN       NaN       NaN  ...  0.520094       NaN  0.914679
index_2         NaN       NaN       NaN  ...       NaN  0.606264  0.615714
index_3         NaN       NaN       NaN  ...  0.350248       NaN       NaN
index_4    0.018335  0.296533  0.128359  ...       NaN  0.463084       NaN
index_5    0.164180  0.321482  0.423314  ...  0.909450       NaN  0.968680
            ...       ...       ...  ...       ...       ...       ...
index_96   0.444807  0.010812       NaN  ...  0.763669  0.074926       NaN
index_97        NaN       NaN       NaN  ...       NaN       NaN       NaN
index_98        NaN  0.285217       NaN  ...  0.913199  0.699628       NaN
index_99   0.800824  0.004250       NaN  ...       NaN       NaN  0.302858
index_100       NaN       NaN       NaN  ...  0.875435  0.700814       NaN

[100 rows x 100 columns]

compute result

result = {}
for i in range(len(df)):
    l = df.columns[np.isfinite(df.iloc[i])]
    result['list_' + str(i)] = list(l)

get list_1

result['list_1']
Out[47]: 
['col_4',
 'col_6',
 'col_9',
 'col_10',
 'col_14',
 'col_15',
 'col_18',
 'col_19',
 'col_26',
 'col_27',
 'col_33',
 'col_34',
 'col_35',
 'col_37',
 'col_38',
 'col_39',
 'col_40',
 'col_42',
 'col_43',
 'col_50',
 'col_57',
 'col_59',
 'col_60',
 'col_63',
 'col_64',
 'col_65',
 'col_66',
 'col_67',
 'col_69',
 'col_71',
 'col_76',
 'col_80',
 'col_81',
 'col_85',
 'col_87',
 'col_88',
 'col_89',
 'col_92',
 'col_96',
 'col_97',
 'col_99',
 'col_100']

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150755

You can use stack to remove nan and groupby to gather all column names:

(df.stack()
   .reset_index(level=1)
   .groupby(level=0, sort=False)
   ['level_1'].apply(list)
)

Output:

index_1      [col_1, col_2, col_100]
index_2      [col_3, col_4, col_100]
index_100             [col_2, col_3]
Name: level_1, dtype: object

Upvotes: 3

Related Questions