Reputation: 3042
I have a dataframe with categorical data in it.
I have come with a procedure to keep only desired categories, while moving up the remaining categories in the empty cells of deleted values.
But I want to do it without the list intermediaries if possible.
import pandas as pd
mydf = pd.DataFrame(data = {'a': [9,6,3,8,5],
'b': [4, 3,5,6,7],
'c': [5, 3,6,9,10]
}
)
selecList = [5,8,4,6] # only this categories shall remain
mydf
a b c
0 9 4 5
1 6 3 3
2 3 5 6
3 8 6 9
4 5 7 10
Desired Output
a b c
0 6 4 5
1 8 5 6
2 5 6 <NA>
My workaround:
myList = mydf.T.values.tolist()
myList
[[9, 6, 3, 8, 5], [4, 3, 5, 6, 7], [5, 3, 6, 9, 10]]
filtered_list = [[x for x in y if x in selecList ] for y in myList]
filtered_list
[[6, 8, 5], [4, 5, 6], [5, 6]]
filtered_df = pd.DataFrame(filtered_list).T
filtered_df.columns = list(mydf)
filtered_df = filtered_df.astype('Int64')
Unsuccessful try:
pd.DataFrame(mydf.apply(lambda y: [x for x in y if x in selecList ])).T
Upvotes: 0
Views: 115
Reputation: 8768
Here is an alternative solution:
df.where(df.isin(selecList)).dropna(how='all')
Here is a another solution:
df.where(df.isin(selecList)).stack().droplevel(0).to_frame().assign(i = lambda x: x.groupby(level=0).cumcount()).set_index('i',append=True)[0].unstack(level=0)
Upvotes: 2