Reputation: 3419
I had following data frame (the real data frame is much more larger than this one ) :
sale_user_id sale_product_id count
1 1 1
1 8 1
1 52 1
1 312 5
1 315 1
Then reshaped it to move the values in sale_product_id as column headers using the following code:
reshaped_df=id_product_count.pivot(index='sale_user_id',columns='sale_product_id',values='count')
and the resulting data frame is:
sale_product_id -1057 1 2 3 4 5 6 8 9 10 ... 98 980 981 982 983 984 985 986 987 99
sale_user_id
1 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
as you can see we have a multililevel index , what i need is to have sale_user_is in the first column without multilevel indexing:
i take the following approach :
reshaped_df.reset_index()
the the result would be like this i still have the sale_product_id column , but i do not need it anymore:
sale_product_id sale_user_id -1057 1 2 3 4 5 6 8 9 ... 98 980 981 982 983 984 985 986 987 99
0 1 NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 3 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 4 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN
i can subset this data frame to get rid of sale_product_id
but i don't think it would be efficient.I am looking for an efficient way to get rid of multilevel indexing while reshaping the original data frame
Upvotes: 53
Views: 84477
Reputation: 14968
You can also use a to_flat_index
method of MultiIndex
object to convert it into a list of tuples, which you can then concatenate with list comprehension and use it to overwrite the .columns
attribute of your dataframe.
# create a dataframe
df = pd.DataFrame({"a": [1, 2, 3, 1], "b": ["x", "x", "y", "y"], "c": [0.1, 0.2, 0.1, 0.2]})
a b c
0 1 x 0.1
1 2 x 0.2
2 3 y 0.1
3 1 y 0.2
# pivot the dataframe
df_pivoted = df.pivot(index="a", columns="b")
c
b x y
a
1 0.1 0.2
2 0.2 NaN
3 NaN 0.1
Now let's overwrite the .columns
attribute and .reset_index()
:
df_pivoted.columns = ["_".join(tup) for tup in df_pivoted.columns.to_flat_index()]
df_pivoted.reset_index()
a c_x c_y
0 1 0.1 0.2
1 2 0.2 NaN
2 3 NaN 0.1
Upvotes: 4
Reputation: 11409
Pivot from long to wide format using pivot:
import pandas
df = pandas.DataFrame({
"lev1": [1, 1, 1, 2, 2, 2],
"lev2": [1, 1, 2, 1, 1, 2],
"lev3": [1, 2, 1, 2, 1, 2],
"lev4": [1, 2, 3, 4, 5, 6],
"values": [0, 1, 2, 3, 4, 5]})
df_wide = df.pivot(index="lev1", columns=["lev2", "lev3"], values="values")
df_wide
# lev2 1 2
# lev3 1 2 1 2
# lev1
# 1 0.0 1.0 2.0 NaN
# 2 4.0 3.0 NaN 5.0
Rename the (sometimes confusing) axis names
df_wide.rename_axis(columns=[None, None])
# 1 2
# 1 2 1 2
# lev1
# 1 0.0 1.0 2.0 NaN
# 2 4.0 3.0 NaN 5.0
Upvotes: 0
Reputation: 980
We need to reset_index()
to reset the index columns back into the dataframe, then rename_axis()
to rename the index to None
and the columns to their axis=1
(column headers) values.
reshaped_df = reshaped_df.reset_index().rename_axis(None, axis=1)
Upvotes: 3
Reputation: 1207
Make a DataFrame
import random
d = {'Country': ['Afghanistan','Albania','Algeria','Andorra','Angola']*2,
'Year': [2005]*5 + [2006]*5, 'Value': random.sample(range(1,20),10)}
df = pd.DataFrame(data=d)
df:
Country Year Value
1 Afghanistan 2005 6
2 Albania 2005 13
3 Algeria 2005 10
4 Andorra 2005 11
5 Angola 2005 5
6 Afghanistan 2006 3
7 Albania 2006 2
8 Algeria 2006 7
9 Andorra 2006 3
10 Angola 2006 6
Pivot
table = df.pivot(index='Country',columns='Year',values='Value')
Table:
Year Country 2005 2006
0 Afghanistan 16 9
1 Albania 17 19
2 Algeria 11 7
3 Andorra 5 12
4 Angola 6 18
I want 'Year' to be 'index':
clean_tbl = table.rename_axis(None, axis=1).reset_index(drop=True)
clean_tbl:
Country 2005 2006
0 Afghanistan 16 9
1 Albania 17 19
2 Algeria 11 7
3 Andorra 5 12
4 Angola 6 18
Done!
Upvotes: 10
Reputation: 1660
The way it works for me is
df_cross=pd.DataFrame(pd.crosstab(df[c1], df[c2]).to_dict()).reset_index()
Upvotes: -1
Reputation: 862771
You need remove only index name
, use rename_axis
(new in pandas
0.18.0
):
print (reshaped_df)
sale_product_id 1 8 52 312 315
sale_user_id
1 1 1 1 5 1
print (reshaped_df.index.name)
sale_user_id
print (reshaped_df.rename_axis(None))
sale_product_id 1 8 52 312 315
1 1 1 1 5 1
Another solution working in pandas below 0.18.0
:
reshaped_df.index.name = None
print (reshaped_df)
sale_product_id 1 8 52 312 315
1 1 1 1 5 1
If need remove columns name
also:
print (reshaped_df.columns.name)
sale_product_id
print (reshaped_df.rename_axis(None).rename_axis(None, axis=1))
1 8 52 312 315
1 1 1 1 5 1
Another solution:
reshaped_df.columns.name = None
reshaped_df.index.name = None
print (reshaped_df)
1 8 52 312 315
1 1 1 1 5 1
EDIT by comment:
You need reset_index
with parameter drop=True
:
reshaped_df = reshaped_df.reset_index(drop=True)
print (reshaped_df)
sale_product_id 1 8 52 312 315
0 1 1 1 5 1
#if need reset index nad remove column name
reshaped_df = reshaped_df.reset_index(drop=True).rename_axis(None, axis=1)
print (reshaped_df)
1 8 52 312 315
0 1 1 1 5 1
Of if need remove only column name:
reshaped_df = reshaped_df.rename_axis(None, axis=1)
print (reshaped_df)
1 8 52 312 315
sale_user_id
1 1 1 1 5 1
Edit1:
So if need create new column from index
and remove columns names
:
reshaped_df = reshaped_df.rename_axis(None, axis=1).reset_index()
print (reshaped_df)
sale_user_id 1 8 52 312 315
0 1 1 1 1 5 1
Upvotes: 50