Merged cells in excel become NaN in pandas

Question

How can I read excel file which has form like this into pandas DataFrame?

a       b   c    d       e    f
Type    1   22   Car     Yes  2019
                 Train   Yes  
Type    2   25   Car     No   2018
Notype  1        Car     Yes  2019
                 Train

First row has three columns that are merged cells (2 rows) however the rest are separate rows

The thing is if i use

data = pd.read_excel("excel.xls").fillna(method='ffill')

Then value "25" from third row and "Yes" from fourth row will fill NaN value below and that is not something I want. So every column that is merged should copy exact values for both rows. In this case "a", "b", "c" and "f" are merged columns

So correctly it should load like this:

a       b   c    d       e   f
Type    1   22   Car     Yes 2019
Type    1   22   Train   Yes 2019
Type    2   25   Car     No  2018
Notype  1   NaN  Car     Yes 2019
Notype  1   NaN  Train   NaN 2019

jezrael · Accepted Answer

If need forward filling all columns with excluding some names from list use Index.difference with forward filling missing values:

cols_excluded = ['c','e']
cols = df.columns.difference(cols_excluded)

df[cols] = df[cols].ffill()
print (df)
        a    b     c      d    e
0    Type  1.0  22.0    Car  Yes
1    Type  1.0   NaN  Train  Yes
2    Type  2.0  25.0    Car   No
3  Notype  1.0   NaN    Car  Yes
4  Notype  1.0   NaN  Train  NaN

Also if necessary forward filling all missing values with exlude last missing values per columns (here cols_excluded):

df[cols_excluded] = df[cols_excluded].where(df[cols_excluded].bfill().isna(),
                                            df[cols_excluded].ffill())
print (df)

        a    b     c      d    e
0    Type  1.0  22.0    Car  Yes
1    Type  1.0  22.0  Train  Yes
2    Type  2.0  25.0    Car   No
3  Notype  1.0   NaN    Car  Yes
4  Notype  1.0   NaN  Train  NaN

Merged cells in excel become NaN in pandas

Answers (1)

Related Questions