edd313
edd313

Reputation: 1479

read_csv a MultiIndex DataFrame: problem when the first row is empty

After a long calculation I have finally obtained my DataFrame with MultiIndex columns:

columns = pd.MultiIndex.from_product([[0, 1], [0, 1]])
df = pd.DataFrame(np.ones((2, 4))], columns=columns)
df
Out[14]: 
     0         1     
     0    1    0    1
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0

I now save it to a csv file, and read it back later on with no problem.

df.to_csv("asd.csv")
df = pd.read_csv("asd.csv", header=[0, 1], index_col=[0])
df
Out[15]: 
     0         1     
     0    1    0    1
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0

Repeating the same process with a DataFrame whose first row is filled with NaNs...

columns= pd.MultiIndex.from_product([[0, 1], [0, 1]])
df = pd.DataFrame([np.full(4, np.nan), np.ones(4)], columns=columns)
df
Out[16]: 
     0         1     
     0    1    0    1
0  NaN  NaN  NaN  NaN
1  1.0  1.0  1.0  1.0

... gives the following undesired result:

df.to_csv("asd.csv")
df = pd.read_csv("asd.csv", header=[0, 1], index_col=[0])
df
Out[17]: 
     0         1     
     0    1    0    1
0                    
1  1.0  1.0  1.0  1.0

Inspecting the index shows that what should be the first row is actually interpreted as the row's name.

df.index
Out[18]: Int64Index([1], dtype='int64', name='0')

Any ideas about how to solve this?

Upvotes: 0

Views: 615

Answers (2)

edd313
edd313

Reputation: 1479

The trick seems to be loading the MultiIndex columns using read_csv first, then set the first column as the index in a separate step, finally remove the index names.

Here you have the whole workflow:

# Create df
columns = pd.MultiIndex.from_product([[0, 1], [0, 1]])
df = pd.DataFrame([np.full(4, np.nan), np.ones(4)], columns=columns)
# Save df
df.to_csv("asd.csv")
# Read df, don't specify index columns
df = pd.read_csv("asd.csv", header=[0, 1])
# Set first column as index, replacing the existing one
df.set_index(df.columns[0], inplace=True)
# Erase index names
df.index.names = [None] * df.index.nlevels
df
Out[101]: 
     0         1     
     0    1    0    1
0  NaN  NaN  NaN  NaN
1  1.0  1.0  1.0  1.0

Upvotes: 0

Cosmina Nicu
Cosmina Nicu

Reputation: 21

You could do the following when saving to csv:

df.to_csv("asd.csv",na_rep='NULL')

This will return the following result:

Out[9]: 
     0         1     
     0    1    0    1
0  NaN  NaN  NaN  NaN
1  1.0  1.0  1.0  1.0

Upvotes: 2

Related Questions