Reputation: 3
First Question.
I have a df:
165 232 237
KKI-11 NaN T NaN
KKI-12 NaN A NaN
KKI-5 S T G
KKI-12 G A A
KKI-5 S NaN G
KKI-11 G NaN A
KKI-5 NaN NaN NaN
KKI-11 NaN NaN NaN
KKI-12 NaN NaN NaN
or like this:
232 237 232 165 237 165
KKI-11 T NaN NaN NaN A G
KKI-12 A A A G NaN NaN
KKI-5 NaN G T S G S
so as you can see, for every index and column (multiple equal indices for case 1, columns for case 2), meaning for each cell in an imaginary reduced form, there is a value. How can I manipulate either one of this dataframes to look like:
165 232 237
KKI-5 S T G
KKI-11 G T A
KKI-12 G A A
Hope you can help me remove all NaN and duplications in this specific way. Thank you
Upvotes: 0
Views: 32
Reputation: 26676
Another way and the logic is just as the chained methods is:
df=df.dropna(how='all').fillna(method='ffill').dropna(how='any')).drop_duplicates(keep='last')
165 232 237
KKI-5 S T G
KKI-5 S A G
KKI-11 G A A
Upvotes: 0
Reputation: 28699
one way about it is to sort the index(get the similar values next to each other), group on the sorted index, backward/upward fill, drop nulls, and drop duplicates
df.sort_index().groupby(level=0).bfill().dropna().drop_duplicates()
165 232 237
KKI-11 G T A
KKI-12 G A A
KKI-5 S T G
Upvotes: 1