Gabriel Tanski
Gabriel Tanski

Reputation: 172

Collapse Pandas rows to elliminate NaN entries

Let's consider the following DataFrame

Name A B C D
tom 10.0 NaN NaN NaN
tom NaN 15.0 NaN NaN
tom NaN NaN 20.0 NaN
tom NaN NaN NaN 25.0
tom 30.0 NaN NaN NaN
tom NaN NaN NaN 40.0
john 1.0 NaN NaN NaN
john NaN 2.0 NaN NaN
john NaN NaN 3.0 NaN
john NaN NaN NaN 4.0
john 5.0 NaN NaN NaN
john NaN 6.0 NaN NaN
john NaN NaN 7.0 NaN
john NaN NaN NaN 8.0

I want to collapse it to limit the amount of NaN values in the DataFrame - can be sequential, i.e. combine the neighboring rows if possible, but all I care about is that the values of columns A-D correspond to the same Name after the collapse

My perfect outcome would be

Name A B C D
tom 10.0 15.0 20.0 25.0
tom 30.0 NaN NaN 40.0
john 1.0 2.0 3.0 4.0
john 5.0 6.0 7.0 8.0

From what I understand, Pandas groupby('Name') will not do the trick, because it will leave one entry for each name.

If that is of any help, I use a dictionary to create the dataframe. The dictionary looks like this:

{
    "a": {
        "tom": [10.0, 30.0],
        "john": [1.0, 5.0]
    },
    "b": {
        "tom": [15.0],
        "john": [2.0, 6.0]
    },
    .....
}

So, basically, I am taking every number in the dictionary then create a row with just this number, and then combine all of the rows.

Is there a simple way to collapse the resulting DataFrame or build a more compact DataFrame given such a dictionary

Upvotes: 4

Views: 584

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195553

You can .groupby + .transform (where you "move" the values up). Then drop rows which contain all NaN values:

print(
    df.set_index("Name")
    .groupby(level=0)
    .transform(lambda x: sorted(x, key=lambda k: pd.isna(k)))
    .dropna(axis=0, how="all")
    .reset_index()
)

Prints:

   Name     A     B     C     D
0   tom  10.0  15.0  20.0  25.0
1   tom  30.0   NaN   NaN  40.0
2  john   1.0   2.0   3.0   4.0
3  john   5.0   6.0   7.0   8.0

Upvotes: 5

Related Questions