kevin
kevin

Reputation: 2014

Select columns containing a certain number of values

My dataframe contains the index column (Date) and hundreds of columns.

Date (index) Menu1 Menu2 Menu3 Menu4
2023_1_1_x NaN 5 10 NaN
2023_1_2_x NaN 8 9 NaN
2023_1_3_x NaN 10 8 NaN
2023_1_1_y 10 5 NaN NaN
2023_1_2_y 11 8 NaN NaN
2023_1_3_y 12 10 NaN NaN

The Date index contains duplicates. That's why they end with x or y.

The dataframe has hundreds of columns. I need to select the columns like Menu3 whose index contains x and the values exist for only those x indexes.

Upvotes: 0

Views: 33

Answers (1)

mozway
mozway

Reputation: 260790

You can group by columns ending in _x or not, then check the NA status and ensure only the _x ones have non-NA:

out = (df.notna()
         .groupby(df.index.str.endswith('_x'))
         .any()
         .pipe(lambda d: d.columns[d.eq(d.index, axis=0).all()])
         .tolist()
       )

Output: ['Menu3']

Upvotes: 2

Related Questions