Reputation: 53
I've pandas dataframe based on pivot table with index and columns. Index are presented with values that are not nan at least in one column, while others are nans.
col_1 col_2 col_3 col_4 ... col_100
index_1 1 2 nan nan ... 5
index_2 nan nan 1 1 ... 10
... ... ... ... ... ... ...
index_100 nan 9 4 ... ... nan
How can I get column names of all the not nan values in a row and put them into automatically suffixed list names by each index? Need to get this:
list_1=[col_1, col_2, col_100]
list_2=[col_3, col_4, col_100]
list_100=[col_2, col_3]
Upvotes: 4
Views: 1508
Reputation: 149
Generate sample data
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(100,100), columns=['col_' + str(i) for i in
range(1,101)], index=['index_' + str(i) for i in range(1, 101)])
for i in range(len(df)):
df.iloc[i, np.random.randint(0, 100, size=90).astype(int)] = np.nan
df
Out[45]:
col_1 col_2 col_3 ... col_98 col_99 col_100
index_1 NaN NaN NaN ... 0.520094 NaN 0.914679
index_2 NaN NaN NaN ... NaN 0.606264 0.615714
index_3 NaN NaN NaN ... 0.350248 NaN NaN
index_4 0.018335 0.296533 0.128359 ... NaN 0.463084 NaN
index_5 0.164180 0.321482 0.423314 ... 0.909450 NaN 0.968680
... ... ... ... ... ... ...
index_96 0.444807 0.010812 NaN ... 0.763669 0.074926 NaN
index_97 NaN NaN NaN ... NaN NaN NaN
index_98 NaN 0.285217 NaN ... 0.913199 0.699628 NaN
index_99 0.800824 0.004250 NaN ... NaN NaN 0.302858
index_100 NaN NaN NaN ... 0.875435 0.700814 NaN
[100 rows x 100 columns]
compute result
result = {}
for i in range(len(df)):
l = df.columns[np.isfinite(df.iloc[i])]
result['list_' + str(i)] = list(l)
get list_1
result['list_1']
Out[47]:
['col_4',
'col_6',
'col_9',
'col_10',
'col_14',
'col_15',
'col_18',
'col_19',
'col_26',
'col_27',
'col_33',
'col_34',
'col_35',
'col_37',
'col_38',
'col_39',
'col_40',
'col_42',
'col_43',
'col_50',
'col_57',
'col_59',
'col_60',
'col_63',
'col_64',
'col_65',
'col_66',
'col_67',
'col_69',
'col_71',
'col_76',
'col_80',
'col_81',
'col_85',
'col_87',
'col_88',
'col_89',
'col_92',
'col_96',
'col_97',
'col_99',
'col_100']
Upvotes: 1
Reputation: 150755
You can use stack
to remove nan
and groupby
to gather all column names:
(df.stack()
.reset_index(level=1)
.groupby(level=0, sort=False)
['level_1'].apply(list)
)
Output:
index_1 [col_1, col_2, col_100]
index_2 [col_3, col_4, col_100]
index_100 [col_2, col_3]
Name: level_1, dtype: object
Upvotes: 3