Reputation: 95
I have an excel sheet which contains data generated by running a program with different configurations. Two sample outputs are as follows. My first problem is pd.read_excel()
introduces incremental digit suffices to the column names after the first group is read. For example the first dataframe has the correct column names "A,B,C,D" while the second dataframe has "A.1,B.1,C.1,D.1" and the third "A.2,B.2,C.2,D.2" and so on. How do I make sure that all have the same column names?
My data on excel
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
file_locn = ''r'C:\Users\me\Desktop\output.xlsx'''
df = pd.read_excel(file_locn, sheet_name='1', skiprows=1)
print(df)
df_256 = df[df.columns[0:5]]
df_128 = df[df.columns[6:11]]
df_64 = df[df.columns[12:17]]
print(df_256)
print(df_128)
print(df_64)
Output:
print(df)
print(df_256)
print(df_128)
Upvotes: 4
Views: 736
Reputation: 862511
If duplicated columns names pandas add .Number
for create unique columns names.
Here is possible create MultiIndex
in columns like:
df = pd.read_excel(file_locn, sheet_name='1', header=[0,1])
And then use DataFrame.xs
, if first level is parsed like strings use:
df_256 = df.xs('256', axis=1, level=0)
df_128 = df.xs('128', axis=1, level=0)
df_64 = df.xs('64', axis=1, level=0)
If parsed like integers:
df_256 = df.xs(256, axis=1, level=0)
df_128 = df.xs(128, axis=1, level=0)
df_64 = df.xs(64, axis=1, level=0)
Upvotes: 2