Khaled DELLAL
Khaled DELLAL

Reputation: 921

How to format a dataframe having many NaN values, join all rows to those not starting with NaN

I have the follwing df:

df = pd.DataFrame({
    'col1': [1, np.nan, np.nan, np.nan, 1, np.nan, np.nan, np.nan],
    'col2': [np.nan, 2, np.nan, np.nan, np.nan, 2, np.nan, np.nan],
    'col3': [np.nan, np.nan, 3, np.nan, np.nan, np.nan, 3, np.nan],
    'col4': [np.nan, np.nan, np.nan, 4, np.nan, np.nan, np.nan, 4]
    })

It has the following display:

    col1 col2 col3 col4
0   1.0 NaN NaN NaN
1   NaN 2.0 NaN NaN
2   NaN NaN 3.0 NaN
3   NaN NaN NaN 4.0
4   5.0 NaN NaN NaN
5   NaN 6.0 NaN NaN
6   NaN NaN 7.0 NaN
7   NaN NaN NaN 8.0

My goal is to keep all rows begining with float (not NaN value) and join to them the remaining ones.

The new_df I want to get is:

    col1 col2 col3 col4
0   1   2   3   4
4   5   6   7   8

Any help form your side will be highly appreciated (I upvote all answers).

Thank you!

Upvotes: 3

Views: 39

Answers (3)

Panda Kim
Panda Kim

Reputation: 13247

df.bfill()[df['col1'].notna()]

result:

    col1    col2    col3    col4
0   1.0     2.0     3.0     4.0
4   1.0     2.0     3.0     4.0

your df has 12341234. not 12345678

And, I don't know exact logic you want. If structure of dataset is different, you cant use my code

Upvotes: 1

jezrael
jezrael

Reputation: 863056

If need join first values per groups defined by non missing values in df['col1'] use:

df = (df.reset_index()
        .groupby(df['col1'].notna().cumsum())
        .first()
        .set_index('index'))

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153480

Try this:

df.apply(lambda x: x.dropna().to_numpy())

Output:

   col1  col2  col3  col4
0   1.0   2.0   3.0   4.0
1   5.0   6.0   7.0   8.0

You can also, cast as integers:

df.apply(lambda x: x.dropna().to_numpy(dtype='int'))

Output:

   col1  col2  col3  col4
0     1     2     3     4
1     5     6     7     8

Upvotes: 2

Related Questions