Reputation: 6246
I wanted to create a pivot for some metrics i am working with. However, there are cases where, for a specific year, none of the metrics has values present.
This causes the row to disappear. I wanted to keep this row while making a pivot, but also keep any extra columns. This however seems to cause problems with the dropna=False
parameter, where extra rows are being created. Example below:
import numpy as np
import pandas as pd
df = pd.DataFrame({"country":["a", "a", "a", "a", "b", "b", "b", "b"],
"metric": ["M1", "M2", "M1", "M2", "M1", "M2", "M1", "M2",],
"year": [2000, 2000, 2001, 2001, 2000, 2000, 2001, 2001, ],
"extra_val": ["a_val", "a_val", "a_val", "a_val", "b_val", "b_val", "b_val", "b_val"],
"metric_val": [1, 2, 3, 4, np.nan, np.nan, 7, 8],
})
out1 = df.pivot_table("metric_val", index=["country", "year", "extra_val"], columns="metric")
print(out1)
Output1: for Country b, for year 2000, the row disappeared because it's all NaNs.
metric M1 M2
country year extra_val
a 2000 a_val 1.0 2.0
2001 a_val 3.0 4.0
b 2001 b_val 7.0 8.0
Now,
out2 = df.pivot_table("metric_val", index=["country", "year", "extra_val"], columns="metric", dropna=False)
print(out2)
Output 2: Here, the extra_val
column seems to be treated like a new level, and extra rows with NaNs are being created. I want to keep this column without having these extra rows (the column is guaranteed to have a 1 to 1 mapping to a country)
metric M1 M2
country year extra_val
a 2000 a_val 1.0 2.0
b_val NaN NaN
2001 a_val 3.0 4.0
b_val NaN NaN
b 2000 a_val NaN NaN
b_val NaN NaN
2001 a_val NaN NaN
b_val 7.0 8.0
What i want instead,
desired = out2.iloc[[0, 2, 5, 7]]
print(desired)
To elaborate Desired output: This is what i want the output to look like (the indexes are hardcoded for demonstration, i can't hardcode the indexes in the real data.)
metric M1 M2
country year extra_val
a 2000 a_val 1.0 2.0
2001 a_val 3.0 4.0
b 2000 b_val NaN NaN
2001 b_val 7.0 8.0
How do i get this output?
Upvotes: 4
Views: 304
Reputation: 150745
Pivot is (almost) equivalent to set_index().unstack()
:
(df.set_index(['country','metric','year','extra_val'])
['metric_val'].unstack('metric')
)
Output:
metric M1 M2
country year extra_val
a 2000 a_val 1.0 2.0
2001 a_val 3.0 4.0
b 2000 b_val NaN NaN
2001 b_val 7.0 8.0
Upvotes: 2