hope
hope

Reputation: 95

why pandas dataframe intoroduces additional digits to columns names after reading them from excel?

I have an excel sheet which contains data generated by running a program with different configurations. Two sample outputs are as follows. My first problem is pd.read_excel() introduces incremental digit suffices to the column names after the first group is read. For example the first dataframe has the correct column names "A,B,C,D" while the second dataframe has "A.1,B.1,C.1,D.1" and the third "A.2,B.2,C.2,D.2" and so on. How do I make sure that all have the same column names?

My data on excel

My data on excel

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
file_locn = ''r'C:\Users\me\Desktop\output.xlsx'''
df = pd.read_excel(file_locn, sheet_name='1', skiprows=1)
print(df)
df_256 = df[df.columns[0:5]]
df_128 = df[df.columns[6:11]]
df_64 = df[df.columns[12:17]]

print(df_256)
print(df_128)
print(df_64)

Output:

print(df)

print(df)

print(df_256)

print(df_256)

print(df_128)

print(df_128)

Upvotes: 4

Views: 736

Answers (1)

jezrael
jezrael

Reputation: 862511

If duplicated columns names pandas add .Number for create unique columns names.

Here is possible create MultiIndex in columns like:

df = pd.read_excel(file_locn, sheet_name='1', header=[0,1])

And then use DataFrame.xs, if first level is parsed like strings use:

df_256 = df.xs('256', axis=1, level=0)
df_128 = df.xs('128', axis=1, level=0)
df_64 = df.xs('64', axis=1, level=0)

If parsed like integers:

df_256 = df.xs(256, axis=1, level=0)
df_128 = df.xs(128, axis=1, level=0)
df_64 = df.xs(64, axis=1, level=0)

Upvotes: 2

Related Questions