Reputation: 13
I'm trying to work with data in a pandas dataframe which I am importing from an Excel spreadsheet.
I am importing the data so it has a multi-index structure.
blid_df = pd.read_excel('OriginalClean.xlsx', header=[0,1,2], index_col=None)
I want to index by Country
which I am able to do using set_index
however all my countries become tuples (e.g Australia,
).
I also want to make the country type sit at the correct level and remove these unnamed level labels.
Below is an example of what I am trying to achieve:
Upvotes: 0
Views: 215
Reputation: 37877
Maybe I'm wrong but I imagine/suppose that you're reading a multi-header spreadsheet this way :
df = pd.read_excel("file.xlsx", header=[0, 1, 2])
You can try this instead :
df = (
pd.read_excel("file.xlsx", index_col=[0, 1], header=[0, 1, 2]) # 1st chain
.rename_axis(index=["Country", "Country Type"], columns=[None]*3)
)
df.index.nlevels # should be 2 (previously 1)
df.columns.nlevels # should be 3
If you're not dealing with an Excel file, replace the first chain with df.set_index(list(df.columns[:2]))
.
Upvotes: 1
Reputation: 1121
You could rename all the columns after you load the data. Having columns with the same name, will not make it happy. Try something like this:
import pandas as pd
#proper non repeated column names
column_names = ["new_column_name1", "new_column_name2", "new_column_name3"]
data = pd.read_csv("your_data.csv", names=column_names, header=None)
#proper data_types
data_types = {
"new_column_name1": str, # Example: Integer data type
"new_column_name2": float, # Example: Float data type
"new_column_name3": int, # Example: Float data type
}
# Set data types
data = data.astype(data_types)
# Use the first column as the index
data = data.set_index("new_column_name1")
# Reindex the DataFrame
data = data.reindex(drop=True)
Upvotes: 0