Ferroao
Ferroao

Reputation: 3042

remove values from pandas df and move remaining upwards

I have a dataframe with categorical data in it.

I have come with a procedure to keep only desired categories, while moving up the remaining categories in the empty cells of deleted values.

But I want to do it without the list intermediaries if possible.

import pandas as pd
mydf = pd.DataFrame(data = {'a': [9,6,3,8,5], 
                            'b': [4, 3,5,6,7], 
                            'c': [5, 3,6,9,10]
                           } 
                    )

selecList = [5,8,4,6] # only this categories shall remain

mydf

   a  b   c
0  9  4   5
1  6  3   3
2  3  5   6
3  8  6   9
4  5  7  10

Desired Output

    a   b   c
0   6   4   5
1   8   5   6
2   5   6   <NA>

My workaround:

myList = mydf.T.values.tolist()
myList

[[9, 6, 3, 8, 5], [4, 3, 5, 6, 7], [5, 3, 6, 9, 10]] 

filtered_list = [[x for x in y if x in selecList ] for y in myList] 
filtered_list
[[6, 8, 5], [4, 5, 6], [5, 6]]
    
filtered_df = pd.DataFrame(filtered_list).T
filtered_df.columns = list(mydf)
filtered_df = filtered_df.astype('Int64')

Unsuccessful try:

pd.DataFrame(mydf.apply(lambda y: [x for x in y if x in selecList ])).T

Upvotes: 0

Views: 115

Answers (1)

rhug123
rhug123

Reputation: 8768

Here is an alternative solution:

df.where(df.isin(selecList)).dropna(how='all')

Here is a another solution:

df.where(df.isin(selecList)).stack().droplevel(0).to_frame().assign(i = lambda x: x.groupby(level=0).cumcount()).set_index('i',append=True)[0].unstack(level=0)

Upvotes: 2

Related Questions