fluppenlutz
fluppenlutz

Reputation: 3

Pandas Dataframes, How to get rid of NaN and Dublicates per specific column or index?

First Question.

I have a df:

        165  232  237
KKI-11  NaN    T  NaN
KKI-12  NaN    A  NaN
KKI-5     S    T    G
KKI-12    G    A    A
KKI-5     S  NaN    G
KKI-11    G  NaN    A
KKI-5   NaN  NaN  NaN
KKI-11  NaN  NaN  NaN
KKI-12  NaN  NaN  NaN

or like this:

        232  237  232  165  237  165
KKI-11    T  NaN  NaN  NaN    A    G
KKI-12    A    A    A    G  NaN  NaN
KKI-5   NaN    G    T    S    G    S

so as you can see, for every index and column (multiple equal indices for case 1, columns for case 2), meaning for each cell in an imaginary reduced form, there is a value. How can I manipulate either one of this dataframes to look like:

            165     232     237

KKI-5       S       T        G

KKI-11      G       T        A

KKI-12      G       A        A

Hope you can help me remove all NaN and duplications in this specific way. Thank you

Upvotes: 0

Views: 32

Answers (2)

wwnde
wwnde

Reputation: 26676

Another way and the logic is just as the chained methods is:

df=df.dropna(how='all').fillna(method='ffill').dropna(how='any')).drop_duplicates(keep='last')

      165    232    237
KKI-5   S    T      G
KKI-5   S    A      G
KKI-11  G    A      A

Upvotes: 0

sammywemmy
sammywemmy

Reputation: 28699

one way about it is to sort the index(get the similar values next to each other), group on the sorted index, backward/upward fill, drop nulls, and drop duplicates

df.sort_index().groupby(level=0).bfill().dropna().drop_duplicates()

       165  232 237
KKI-11  G   T   A
KKI-12  G   A   A
KKI-5   S   T   G

Upvotes: 1

Related Questions