Pandas Dataframes, How to get rid of NaN and Dublicates per specific column or index?

Question

First Question.

I have a df:

        165  232  237
KKI-11  NaN    T  NaN
KKI-12  NaN    A  NaN
KKI-5     S    T    G
KKI-12    G    A    A
KKI-5     S  NaN    G
KKI-11    G  NaN    A
KKI-5   NaN  NaN  NaN
KKI-11  NaN  NaN  NaN
KKI-12  NaN  NaN  NaN

or like this:

        232  237  232  165  237  165
KKI-11    T  NaN  NaN  NaN    A    G
KKI-12    A    A    A    G  NaN  NaN
KKI-5   NaN    G    T    S    G    S

so as you can see, for every index and column (multiple equal indices for case 1, columns for case 2), meaning for each cell in an imaginary reduced form, there is a value. How can I manipulate either one of this dataframes to look like:

            165     232     237

KKI-5       S       T        G

KKI-11      G       T        A

KKI-12      G       A        A

Hope you can help me remove all NaN and duplications in this specific way. Thank you

sammywemmy · Accepted Answer

one way about it is to sort the index(get the similar values next to each other), group on the sorted index, backward/upward fill, drop nulls, and drop duplicates

df.sort_index().groupby(level=0).bfill().dropna().drop_duplicates()

       165  232 237
KKI-11  G   T   A
KKI-12  G   A   A
KKI-5   S   T   G

Pandas Dataframes, How to get rid of NaN and Dublicates per specific column or index?

Answers (2)

Related Questions