DougKruger
DougKruger

Reputation: 4624

Flatten pandas pivot table

This is a follow up of my question. Rather than a pivot table, is it possible to flatten table to look like the following:

data = {'year': ['2016', '2016', '2015', '2014', '2013'],
    'country':['uk', 'usa', 'fr','fr','uk'],
    'sales': [10, 21, 20, 10,12],
    'rep': ['john', 'john', 'claire', 'kyle','kyle']
    }

pd.DataFrame(data).pivot_table(index='country', columns='year', values=['rep','sales'])

          rep                       sales                  
year     2013  2014    2015  2016   2013  2014  2015  2016
country                                                  
fr       None  kyle  claire  None   None    10    20  None
uk       kyle  None    None  john    12  None  None    10
usa      None  None    None  john   None  None  None    21

Flattened table:

        rep_2013 rep_2014 rep_2015 rep_2016  sales_2013  sales_2014  sales_2015  sales_2016
country                                                  
fr       None    kyle     claire    None      None        10            20          None
uk       kyle    None     None      john      12          None          None         10
usa      None    None     None      john      None        None          None         21

Upvotes: 22

Views: 36863

Answers (4)

kwa
kwa

Reputation: 59

To create the new flattened columns:

cols = list(map(lambda x: "_".join(x), df.columns.tolist()))

apply the new column names to the DataFrame:

df = df[cols]

Upvotes: 1

user2458922
user2458922

Reputation: 1721

df.columns = df.columns.to_series().str.join('_')
print(df.columns.shape) #(1,_X_) # a 2 D Array.

Would work, but down the line you may face problems , as you try accessing some columns with some way that is not 2D Column name Friendly.

I would suggest, use

pivoteCols = df.columns.to_series().str.join('_')
pivoteCols = pivoteCols.values.reshape(len(pivoteCols))
df.columns = pivoteCols
print(df.columns.shape) # One Dimensional 

Upvotes: 0

piRSquared
piRSquared

Reputation: 294318

see collapse a pandas MultiIndex

Solution

df.columns = df.columns.to_series().str.join('_')

Upvotes: 39

SerialDev
SerialDev

Reputation: 2847

Try this:

df.columns = df.columns.get_level_values(0)

followed by:

df.columns = [' '.join(col).strip() for col in df.columns.values]

This should flatten your multi-index

Upvotes: 2

Related Questions