Paritosh Singh
Paritosh Singh

Reputation: 6246

Pandas pivot: how to keep rows with all NaNs without introducing extra rows

I wanted to create a pivot for some metrics i am working with. However, there are cases where, for a specific year, none of the metrics has values present.

This causes the row to disappear. I wanted to keep this row while making a pivot, but also keep any extra columns. This however seems to cause problems with the dropna=False parameter, where extra rows are being created. Example below:

import numpy as np
import pandas as pd
df = pd.DataFrame({"country":["a", "a", "a", "a", "b", "b", "b", "b"],
              "metric": ["M1", "M2", "M1", "M2", "M1", "M2", "M1", "M2",],
              "year": [2000, 2000, 2001, 2001, 2000, 2000, 2001, 2001, ],
              "extra_val": ["a_val", "a_val", "a_val", "a_val", "b_val", "b_val", "b_val", "b_val"],
              "metric_val": [1, 2, 3, 4, np.nan, np.nan, 7, 8],
              })

out1 = df.pivot_table("metric_val", index=["country", "year", "extra_val"], columns="metric")
print(out1)

Output1: for Country b, for year 2000, the row disappeared because it's all NaNs.

metric                   M1   M2
country year extra_val          
a       2000 a_val      1.0  2.0
        2001 a_val      3.0  4.0
b       2001 b_val      7.0  8.0

Now,

out2 = df.pivot_table("metric_val", index=["country", "year", "extra_val"], columns="metric", dropna=False)
print(out2)

Output 2: Here, the extra_val column seems to be treated like a new level, and extra rows with NaNs are being created. I want to keep this column without having these extra rows (the column is guaranteed to have a 1 to 1 mapping to a country)

metric                   M1   M2
country year extra_val          
a       2000 a_val      1.0  2.0
             b_val      NaN  NaN
        2001 a_val      3.0  4.0
             b_val      NaN  NaN
b       2000 a_val      NaN  NaN
             b_val      NaN  NaN
        2001 a_val      NaN  NaN
             b_val      7.0  8.0

What i want instead,

desired = out2.iloc[[0, 2, 5, 7]]
print(desired)

To elaborate Desired output: This is what i want the output to look like (the indexes are hardcoded for demonstration, i can't hardcode the indexes in the real data.)

metric                   M1   M2
country year extra_val          
a       2000 a_val      1.0  2.0
        2001 a_val      3.0  4.0
b       2000 b_val      NaN  NaN
        2001 b_val      7.0  8.0

How do i get this output?

Upvotes: 4

Views: 304

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

Pivot is (almost) equivalent to set_index().unstack():

(df.set_index(['country','metric','year','extra_val'])
    ['metric_val'].unstack('metric')
)

Output:

metric                   M1   M2
country year extra_val          
a       2000 a_val      1.0  2.0
        2001 a_val      3.0  4.0
b       2000 b_val      NaN  NaN
        2001 b_val      7.0  8.0

Upvotes: 2

Related Questions