Reputation: 1479
After a long calculation I have finally obtained my DataFrame with MultiIndex columns:
columns = pd.MultiIndex.from_product([[0, 1], [0, 1]])
df = pd.DataFrame(np.ones((2, 4))], columns=columns)
df
Out[14]:
0 1
0 1 0 1
0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0
I now save it to a csv file, and read it back later on with no problem.
df.to_csv("asd.csv")
df = pd.read_csv("asd.csv", header=[0, 1], index_col=[0])
df
Out[15]:
0 1
0 1 0 1
0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0
Repeating the same process with a DataFrame whose first row is filled with NaNs...
columns= pd.MultiIndex.from_product([[0, 1], [0, 1]])
df = pd.DataFrame([np.full(4, np.nan), np.ones(4)], columns=columns)
df
Out[16]:
0 1
0 1 0 1
0 NaN NaN NaN NaN
1 1.0 1.0 1.0 1.0
... gives the following undesired result:
df.to_csv("asd.csv")
df = pd.read_csv("asd.csv", header=[0, 1], index_col=[0])
df
Out[17]:
0 1
0 1 0 1
0
1 1.0 1.0 1.0 1.0
Inspecting the index shows that what should be the first row is actually interpreted as the row's name.
df.index
Out[18]: Int64Index([1], dtype='int64', name='0')
Any ideas about how to solve this?
Upvotes: 0
Views: 615
Reputation: 1479
The trick seems to be loading the MultiIndex columns using read_csv first, then set the first column as the index in a separate step, finally remove the index names.
Here you have the whole workflow:
# Create df
columns = pd.MultiIndex.from_product([[0, 1], [0, 1]])
df = pd.DataFrame([np.full(4, np.nan), np.ones(4)], columns=columns)
# Save df
df.to_csv("asd.csv")
# Read df, don't specify index columns
df = pd.read_csv("asd.csv", header=[0, 1])
# Set first column as index, replacing the existing one
df.set_index(df.columns[0], inplace=True)
# Erase index names
df.index.names = [None] * df.index.nlevels
df
Out[101]:
0 1
0 1 0 1
0 NaN NaN NaN NaN
1 1.0 1.0 1.0 1.0
Upvotes: 0
Reputation: 21
You could do the following when saving to csv:
df.to_csv("asd.csv",na_rep='NULL')
This will return the following result:
Out[9]:
0 1
0 1 0 1
0 NaN NaN NaN NaN
1 1.0 1.0 1.0 1.0
Upvotes: 2