Reputation: 172
Let's consider the following DataFrame
Name | A | B | C | D |
---|---|---|---|---|
tom | 10.0 | NaN | NaN | NaN |
tom | NaN | 15.0 | NaN | NaN |
tom | NaN | NaN | 20.0 | NaN |
tom | NaN | NaN | NaN | 25.0 |
tom | 30.0 | NaN | NaN | NaN |
tom | NaN | NaN | NaN | 40.0 |
john | 1.0 | NaN | NaN | NaN |
john | NaN | 2.0 | NaN | NaN |
john | NaN | NaN | 3.0 | NaN |
john | NaN | NaN | NaN | 4.0 |
john | 5.0 | NaN | NaN | NaN |
john | NaN | 6.0 | NaN | NaN |
john | NaN | NaN | 7.0 | NaN |
john | NaN | NaN | NaN | 8.0 |
I want to collapse it to limit the amount of NaN
values in the DataFrame - can be sequential, i.e. combine the neighboring rows if possible, but all I care about is that the values of columns A-D
correspond to the same Name
after the collapse
My perfect outcome would be
Name | A | B | C | D |
---|---|---|---|---|
tom | 10.0 | 15.0 | 20.0 | 25.0 |
tom | 30.0 | NaN | NaN | 40.0 |
john | 1.0 | 2.0 | 3.0 | 4.0 |
john | 5.0 | 6.0 | 7.0 | 8.0 |
From what I understand, Pandas groupby('Name')
will not do the trick, because it will leave one entry for each name.
If that is of any help, I use a dictionary to create the dataframe. The dictionary looks like this:
{
"a": {
"tom": [10.0, 30.0],
"john": [1.0, 5.0]
},
"b": {
"tom": [15.0],
"john": [2.0, 6.0]
},
.....
}
So, basically, I am taking every number in the dictionary then create a row with just this number, and then combine all of the rows.
Is there a simple way to collapse the resulting DataFrame or build a more compact DataFrame given such a dictionary
Upvotes: 4
Views: 584
Reputation: 195553
You can .groupby
+ .transform
(where you "move" the values up). Then drop rows which contain all NaN
values:
print(
df.set_index("Name")
.groupby(level=0)
.transform(lambda x: sorted(x, key=lambda k: pd.isna(k)))
.dropna(axis=0, how="all")
.reset_index()
)
Prints:
Name A B C D
0 tom 10.0 15.0 20.0 25.0
1 tom 30.0 NaN NaN 40.0
2 john 1.0 2.0 3.0 4.0
3 john 5.0 6.0 7.0 8.0
Upvotes: 5