Reputation: 357
I'm working with a multiindex - multicolumns DataFrame in Pandas. I have 16 main columns (level 0), such as Climate Change, Ozone Depletion,etc; and each of them contains 2 subcolumns (level 1): Agriculture and Transformation.
I want to create a new subcolumn (Transport
) to each main column, that I will fill later with new data. The thing is that I don't want to make a loop (not sure how I would do that) but I can't find any direct way to perform it.
Already tried the loop option:
for i in list(df.columns.levels[0]):
df[i, 'Transport'] = ''
However, this duplicates the main columns and create them with the new empty field. Is there anyway to do it without a loop? If there isn't, how do I group these new created columns?
Already check this question but didn't for me: Add a subcolumn to each column on multilevel column index
Thanks in advance.
EDIT
import pandas as pd
import numpy as np
df = pd.DataFrame([[1,2,3,4], [5,6,7,8],[9,10,11,12]], columns = pd.MultiIndex.from_product([('Acidification', 'Climate Change'), ('Agriculture', 'Transformation')]), index=['potato', 'tomato', 'onion'])
With @Laurent answer I got the following dataframe:
In order to get what I wanted, I needed to change one line:
dfs = []
for col in df.columns.levels[0]:
tmp = df.copy()[col]
tmp.loc[:, ("Transport")] = "" #This is the modified line
dfs.append(tmp)
new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])
This output:
However, I want a would like to have a more general method (maybe this one is, but I don't know how to do it).
Question 1: If now I would like to add new two subcolumns (level 2) to "Transport" subcolumn (level 1) to all main columns (level 0), how can I do it?
I have tried the following code, but raise an error of unhashable type: 'slice'
# Try to apply logic above
dfn = []
for col in new_df.columns.levels[0]:
for scol in new_df.columns.levels[1]:
tmp = new_df.copy()[col][scol]
tmp.loc[:, ('National')] = ""
tmp.loc[:, ('Import')] = ""
dfn.append(tmp)
new_dfn = pd.concat(dfn, axis=1, keys=new_df.columns.levels[0])
Question 2: Is there a general way to add new subsubcolumns to a specific subcolumn (in all columns)?
Upvotes: 1
Views: 936
Reputation: 13458
With the following toy dataframe as an example:
import pandas as pd
df = pd.DataFrame(
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
columns=pd.MultiIndex.from_product([("col1", "col2"), ("sub_col1", "sub_col2")]),
index=[0, 1, 2],
)
print(df)
# Output
col1 col2
sub_col1 sub_col2 sub_col1 sub_col2
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
Here is one way to do it using Pandas advanced indexing with hierarchical index and concat:
dfs = []
for col in df.columns.levels[0]:
tmp = df.copy()[col]
tmp.loc[:, ("sub_col3")] = ""
dfs.append(tmp)
new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])
Then:
print(new_df)
# Output
col1 col2
sub_col1 sub_col2 col1 sub_col3 sub_col1 sub_col2 col2 sub_col3
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
And here is how to add two new subcolumns (level 2) to a [target] subcolumn (level 1) to all main columns (level 0):
dfs = []
sub_col = "sub_col3"
sub_sub_cols = ["sub_sub_col1", "sub_sub_col2"]
for col in df.columns.levels[0]:
tmp = df.copy()[col]
tmp = pd.concat([tmp, pd.DataFrame([[""]*tmp.shape[1] for _ in range(tmp.shape[0])], columns=pd.MultiIndex.from_product([[sub_col], sub_sub_cols]))], axis=1)
dfs.append(tmp)
new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])
Then:
print(new_df)
# Output
col1 \
sub_col1 sub_col2 (sub_col3, sub_sub_col1) (sub_col3, sub_sub_col2)
0 1 2
1 5 6
2 9 10
col2
sub_col1 sub_col2 (sub_col3, sub_sub_col1) (sub_col3, sub_sub_col2)
0 3 4
1 7 8
2 11 12
Since all columns of a given level do not have the same number of subcolumns, columns header are represented by tuples.
Upvotes: 2