Reputation: 107
I have a code to transofrm an excel file with data whioch has two headers into a dataframe. Somehow my code is not doing that what i expect:
import pandas as pd
df = pd.read_excel('excel_file.xlsx', sheet_name='Class', header=None, names= header_names, skiprows=2)
somehow I receive a weird output:
NaN NaN NaT NaN NaN NaT
3 NaN NaT
2015-06-30 4 NaN
etc.
I dont know what I am doing wrong.
My excel file is structured as the following but has even more columns.
A B C
math biology physics
01.06.2015 3 01.06.2015 3 01.06.2015 3
30.06.2016 4 30.06.2016 4 30.06.2016 4
15.06.2017 1 15.06.2017 3 15.06.2017 3
20.06.2018 3 20.06.2018 2 20.06.2018 5
15.06.2019 2 15.06.2019 3 15.06.2019 2
The expected dataframe output should be like this:
date grade course group
01.06.2015 3 math A
30.06.2016 4 math A
15.06.2017 1 math A
20.06.2018 3 math A
15.06.2019 2 math A
01.06.2015 3 biology B
30.06.2016 4 biology B
15.06.2017 3 biology B
20.06.2018 2 biology B
15.06.2019 3 biology B
01.06.2015 3 physics C
30.06.2016 4 physics C
15.06.2017 3 physics C
20.06.2018 5 physics C
15.06.2019 2 physics C
Upvotes: 0
Views: 47
Reputation: 455
just by reading this data you won't get the output you showed. Also, you skip 2 rows which have information about the course and the group. What I would do is:
import pandas as pd
df = pd.read_excel('stack.xlsx',header=None, skiprows=3)
df = pd.DataFrame(df)
0 1 2 3 4 5
0 01.06.2015 3 01.06.2015 3 01.06.2015 3
Now, changing names of columns
df.columns=['math A','grade A', 'Bio B','grade B','physic C', 'grade C']
math A grade A Bio B grade B physic C grade C
0 01.06.2015 3 01.06.2015 3 01.06.2015 3
After that, you can iterate through the lines and create a new data frame with the output you want :)
3 loops like this and you will get what you want, just remember that you cannot initialize with i = 0, but i you will start the next loop, is the i you finished the prev loop
df_wanted = pd.DataFrame(columns=['date','grade','course','group'])
i = 0
for index,row in df[['math A','grade A']].iterrows():
df_wanted.loc[i,'date'] = row[0]
df_wanted.loc[i,'grade'] = row[1]
df_wanted.loc[i,'course'] = 'math'
df_wanted.loc[i,'group'] = 'A'
i+=1
date grade course group
0 01.06.2015 3 math A
Upvotes: 1