Reputation: 1956
I have a dataframe which I extend to include values for all increments in 2 columns. Therefor NaN values are introduced, as expected and desired.
However, when I use pivot on this dataframe I'll get a row and column for NaN. Can I prevent this when doing the pivot? If not, how can I drop a column named NaN? Trying to drop it by calling [NaN],[nan] or ['NaN'] doesn't work.
Dropping the columns and rows where all values are NaN is not working in this case as the column headings and indexes are used for a seaborn heatmap plot, so eventhough all cell values are NaN it is still useful to have it as the index and key values are not NaN
Sample code;
import pandas as pd
import numpy as np
#generate dummy data
df = pd.DataFrame({'Y': np.random.randint(130,140,10),
'X': np.random.randint(5,10,10),
'Z': np.random.randint(0,25, size=10)})
df = df.round(1)
#create dataset for heatmap
#group by axis to plot
df = df.groupby(['X','Y']).sum().reset_index()
df = df.sort_values(by=['Y'])
dfY = pd.DataFrame({'Y':np.arange(min(df['Y']), max(df['Y']),1)})
dfX = pd.DataFrame({'X':np.arange(min(df['X']), max(df['X']),1)})
df = pd.merge(df,dfY, how='outer', on='Y')
df = pd.merge(df,dfX, how='outer', on='X')
df = df.round(1)
print(df)
#restructure for heatmap
data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)
print(data)
Sample DataFrame before pivot:
X Y Z
0 5.0 132.0 0.0
1 5.0 135.0 20.0
2 5.0 137.0 17.0
3 7.0 132.0 15.0
4 7.0 133.0 3.0
5 6.0 133.0 30.0
6 6.0 135.0 22.0
7 6.0 138.0 16.0
8 9.0 135.0 9.0
9 NaN 134.0 NaN
10 NaN 136.0 NaN
11 8.0 NaN NaN
After pivot:
X NaN 5.0 6.0 7.0 8.0 9.0
Y
138.0 NaN NaN 16.0 NaN NaN NaN
137.0 NaN 17.0 NaN NaN NaN NaN
136.0 NaN NaN NaN NaN NaN NaN
135.0 NaN 20.0 22.0 NaN NaN 9.0
134.0 NaN NaN NaN NaN NaN NaN
133.0 NaN NaN 30.0 3.0 NaN NaN
132.0 NaN 0.0 NaN 15.0 NaN NaN
NaN NaN NaN NaN NaN NaN NaN
Desired output:
X 5.0 6.0 7.0 8.0 9.0
Y
138.0 NaN 16.0 NaN NaN NaN
137.0 17.0 NaN NaN NaN NaN
136.0 NaN NaN NaN NaN NaN
135.0 20.0 22.0 NaN NaN 9.0
134.0 NaN NaN NaN NaN NaN
133.0 NaN 30.0 3.0 NaN NaN
132.0 0.0 NaN 15.0 NaN NaN
Upvotes: 3
Views: 454
Reputation: 862751
For me working drop
by missing value np.nan
:
data = (df.pivot("Y","X","Z")
.sort_values(by=['Y'],ascending=False)
.drop(np.nan, axis=1)
.drop(np.nan))
Or:
data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)
data = data.reindex(index=data.index.difference([np.nan]),
columns=data.columns.difference([np.nan]))
Upvotes: 2