Deb
Deb

Reputation: 193

Convert Pandas Dataframe from Row based to Columnar

My Dataframe (df) looks like this:

Date       FieldA  ValueA ValueB
09-02-2016 TypeA   3       5
09-02-2016 TypeB   6       7

I want the dataframe to look like below:

Date       TypeA_ValueA   TypeA_ValueB  TypeB_ValueA TypeB_ValueB
09-02-2016  3             5             6            7

I tired the df.pivot in pandas where I can provide single Value column. It doesnt take more than one. When I provide more than one i get below exception. pandas_pivot

Exception: Data must be 1-dimensional

Upvotes: 0

Views: 735

Answers (3)

piRSquared
piRSquared

Reputation: 294218

df1 = df.set_index(['Date', 'FieldA']).unstack()
df1.columns = df1.columns.map('_'.join)

df1.reset_index()

enter image description here


Setup Reference

from StringIO import StringIO
import pandas as pd

text = """Date       FieldA  ValueA ValueB
09-02-2016 TypeA   3       5
09-02-2016 TypeB   6       7"""

df = pd.read_csv(StringIO(text), delim_whitespace=True)

df

enter image description here

Upvotes: 2

grechut
grechut

Reputation: 3027

Use pd.pivot_table.

In [1]: pd.pivot_table(df, index='Date', columns='FieldA', values=['ValueA', 'ValueB'])
Out[1]: 
           ValueA       ValueB      
FieldA      TypeA TypeB  TypeA TypeB
Date                                
09-02-2016      3     6      5     7

As a result, you'll get a DataFrame with MultiIndex. If you want to flatten it and have _ as separator in the column name, you can just do:

In [1]: df = pd.pivot_table(df, index='Date', columns='FieldA', values=['ValueA', 'ValueB'])

In [2]: df.columns = [ '{}_{}'.format(cat, val) for val, cat in df.columns ]

In [3]: df
Out[3]: 
            TypeA_ValueA  TypeB_ValueA  TypeA_ValueB  TypeB_ValueB
Date                                                              
09-02-2016             3             6             5             7

Upvotes: 0

Nehal J Wani
Nehal J Wani

Reputation: 16629

In [36]: df
Out[36]: 
        Date FieldA  ValueA  ValueB
0 2016-09-02  TypeA       3       5
1 2016-09-02  TypeB       6       7
2 2016-09-03  TypeA       4       8
3 2016-09-03  TypeB       3       9

In [37]: v_cols = df.columns.difference(['FieldA', 'Date'])

In [38]: def func(x):
     ...:     d = {'_'.join([t, c]): x[x['FieldA'] == t][c].iloc[0] for t in x.FieldA for c in v_cols}
     ...:     for k, v in d.iteritems():
     ...:         x[k] = v
     ...:     return x
     ...: 

In [39]: newdf = df.groupby('Date').apply(func)

In [40]: newdf.drop(v_cols.tolist() + ['FieldA'], axis=1).drop_duplicates()
Out[340]: 
        Date  TypeA_ValueA  TypeA_ValueB  TypeB_ValueA  TypeB_ValueB
0 2016-09-02             3             5             6             7
2 2016-09-03             4             8             3             9

Upvotes: 0

Related Questions