Reputation: 3754
How can I read excel file which has form like this into pandas DataFrame?
a b c d e f
Type 1 22 Car Yes 2019
Train Yes
Type 2 25 Car No 2018
Notype 1 Car Yes 2019
Train
First row has three columns that are merged cells (2 rows) however the rest are separate rows
The thing is if i use
data = pd.read_excel("excel.xls").fillna(method='ffill')
Then value "25"
from third row and "Yes"
from fourth row will fill NaN value below and that is not something I want. So every column that is merged should copy exact values for both rows. In this case "a", "b", "c"
and "f"
are merged columns
So correctly it should load like this:
a b c d e f
Type 1 22 Car Yes 2019
Type 1 22 Train Yes 2019
Type 2 25 Car No 2018
Notype 1 NaN Car Yes 2019
Notype 1 NaN Train NaN 2019
Upvotes: 0
Views: 1138
Reputation: 862921
If need forward filling all columns with excluding some names from list use Index.difference
with forward filling missing values:
cols_excluded = ['c','e']
cols = df.columns.difference(cols_excluded)
df[cols] = df[cols].ffill()
print (df)
a b c d e
0 Type 1.0 22.0 Car Yes
1 Type 1.0 NaN Train Yes
2 Type 2.0 25.0 Car No
3 Notype 1.0 NaN Car Yes
4 Notype 1.0 NaN Train NaN
Also if necessary forward filling all missing values with exlude last missing values per columns (here cols_excluded
):
df[cols_excluded] = df[cols_excluded].where(df[cols_excluded].bfill().isna(),
df[cols_excluded].ffill())
print (df)
a b c d e
0 Type 1.0 22.0 Car Yes
1 Type 1.0 22.0 Train Yes
2 Type 2.0 25.0 Car No
3 Notype 1.0 NaN Car Yes
4 Notype 1.0 NaN Train NaN
Upvotes: 3