RustyShackleford
RustyShackleford

Reputation: 3667

How to maintain column names while pivoting data, with no aggregation and not using artificial index?

I have a dataframe like this, which has about 40 unique in the key column and some of them are integer. I do not want anything aggregated if possible :

hash  ft  timestamp               id   key   value
1ab   1   2022-01-02 12:21:11     1    aaa   121289
1ab   1   2022-01-02 12:21:11     1    bbb   13ADF
1ab   1   2022-01-02 12:21:11     2    aaa   13ADH
1ab   1   2022-01-02 12:21:11     2    bbb   13ADH
2ab   2   2022-01-02 12:21:11     3    aaa   121382
2ab   2   2022-01-02 12:21:11     3    bbb   121381
2ab   2   2022-01-02 12:21:11     3    ccc   121389

I am trying to pivot the data only on 2 on columns key and value while keeping the remaining same columns and index. Example:

enter image description here

When I run below code, the column names take on grouped values with columnname being for the following columns id, ft, value. One of the actual column names, with parentheses : ('id', '1', '121289') and I am forced to select an index, which I dont want do.

Code: df.pivot_table(index='hash',columns=['ft','key'])

I am not sure what I am doing wrong, that I cant use the value column for values. I get an empty dataframe:

df.pivot_table(index='hash',columns=['ft','key'], values='value')

Upvotes: 1

Views: 674

Answers (1)

PaulS
PaulS

Reputation: 25333

A possible solution, using pandas.DataFrame.pivot:

(df.pivot(index=['hash', 'ft', 'id', 'timestamp'],
          columns=['key'], values='value')
 .reset_index().rename_axis(None, axis=1))

Output:

  hash  ft  id            timestamp     aaa     bbb     ccc
0  1ab   1   1  2022-01-02 12:21:11  121289   13ADF     NaN
1  1ab   1   2  2022-01-02 12:21:11   13ADH   13ADH     NaN
2  2ab   2   3  2022-01-02 12:21:11  121382  121381  121389

Data:

df = pd.DataFrame.from_dict(
    {'hash': {0: '1ab',
  1: '1ab',
  2: '1ab',
  3: '1ab',
  4: '2ab',
  5: '2ab',
  6: '2ab'},
 'ft': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2},
 'timestamp': {0: '2022-01-02 12:21:11',
  1: '2022-01-02 12:21:11',
  2: '2022-01-02 12:21:11',
  3: '2022-01-02 12:21:11',
  4: '2022-01-02 12:21:11',
  5: '2022-01-02 12:21:11',
  6: '2022-01-02 12:21:11'},
 'id': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3, 5: 3, 6: 3},
 'key': {0: 'aaa', 1: 'bbb', 2: 'aaa', 3: 'bbb', 4: 'aaa', 5: 'bbb', 6: 'ccc'},
 'value': {0: '121289',
  1: '13ADF',
  2: '13ADH',
  3: '13ADH',
  4: '121382',
  5: '121381',
  6: '121389'}}
)

EDIT

To overcome an error reported below, in a comment, by the OP, the OP himself suggests the following solution:

(pd.pivot_table(df,index=['hash','ft', 'id'] ,
                columns = ['key'] , 
                values = "value", aggfunc='sum')
 .reset_index().rename_axis(None, axis=1))

Upvotes: 1

Related Questions