freak11
freak11

Reputation: 391

Pivoting and transposing using pandas dataframe

Suppose that I have a pandas dataframe like the one below:

import pandas as pd
df = pd.DataFrame({'fk ID': [1,1,2,2], 
                   'value': [3,3,4,5],
                   'valID': [1,2,1,2]})

The above would give me the following output:

print(df)
   fk ID  value  valID
0      1      3      1
1      1      3      2
2      2      4      1
3      2      5      2

or

 |fk ID| value | valId |
 |  1  |  3    |   1   |
 |  1  |  3    |   2   |
 |  2  |  4    |   1   |
 |  2  |  5    |   2   |

and I would like to transpose and pivot it in such a way that I get the following table and the same order of column names:

  fk ID  value  valID  fkID  value   valID
 |  1  |   3  |   1  |  1  |   3   |  2   | 
 |  2  |   4  |   1  |  2  |   5   |  2   |

Upvotes: 3

Views: 497

Answers (3)

daddydan
daddydan

Reputation: 127

If your group sizes are guaranteed to be the same, you could merge your odd and even rows:

import pandas as pd
df = pd.DataFrame({'fk ID': [1,1,2,2], 
                   'value': [3,3,4,5],
                   'valID': [1,2,1,2]})
df_even = df[df.index%2==0].reset_index(drop=True)
df_odd = df[df.index%2==1].reset_index(drop=True)
df_odd.join(df_even, rsuffix='_2')

Yields

   fk ID  value  valID  fk ID_2  value_2  valID_2
0      1      3      2        1        3        1
1      2      5      2        2        4        1

I'd expect this to be pretty performant, and this could be generalized for any number of rows in each group (vs assuming odd/even for two rows per group), but will require that you have the same number of rows per fk ID.

Upvotes: 1

Eric Marchand
Eric Marchand

Reputation: 639

You can cast df as a numpy array, reshape it and cast it back to a dataframe, then rename the columns (0..5). This is working too if values are not numbers but strings.

import pandas as pd
df = pd.DataFrame({'fk ID': [1,1,2,2], 
                   'value': [3,3,4,5],
                   'valID': [1,2,1,2]})

nrows = 2
array = df.to_numpy().reshape((nrows, -1))
pd.DataFrame(array).rename(mapper=lambda x: df.columns[x % len(df.columns)], axis=1)

Upvotes: 1

Rodalm
Rodalm

Reputation: 5433

The most straightforward solution I can think of is

df = pd.DataFrame({'fk ID': [1,1,2,2], 
                   'value': [3,3,4,5],
                   'valID': [1,2,1,2]})

# concatenate the rows (Series) of each 'fk ID' group side by side 
def flatten_group(g):
    return pd.concat(row for _, row in g.iterrows())

res = df.groupby('fk ID', as_index=False).apply(flatten_group)

However, using Series.iterrows is not ideal, and can be very slow if the size of each group is large.

Furthermore, the above solution doesn't work if the 'fk ID' groups have different sizes. To see that, we can add a third group to the DataFrame

>>> df2 = df.append({'fk ID': 3, 'value':10, 'valID': 4}, 
                    ignore_index=True)
>>> df2

   fk ID  value  valID
0      1      3      1
1      1      3      2
2      2      4      1
3      2      5      2
4      3     10      4

>>> df2.groupby('fk ID', as_index=False).apply(flatten_group)

0  fk ID     1
   value     3
   valID     1
   fk ID     1
   value     3
   valID     2
1  fk ID     2
   value     4
   valID     1
   fk ID     2
   value     5
   valID     2
2  fk ID     3
   value    10
   valID     4
dtype: int64

The result is not a DataFrame as one could expect, because pandas can't align the columns of the groups.


To solve this I suggest the following solution. It should work for any group size, and should be faster for large DataFrames.

import numpy as np 

def flatten_group(g):
    # flatten each group data into a single row 
    flat_data = g.to_numpy().reshape(1,-1)
    return pd.DataFrame(flat_data)

# group the rows by 'fk ID'
groups = df.groupby('fk ID', group_keys=False)

# get the maximum group size 
max_group_size = groups.size().max()

# contruct the new columns by repeating the 
# original columns 'max_group_size' times
new_cols = np.tile(df.columns, max_group_size)

# aggregate the flattened rows 
res = groups.apply(flatten_group).reset_index(drop=True) 
# update the columns 
res.columns = new_cols

Output:

# df 
>>> res

   fk ID  value  valID  fk ID  value  valID
0      1      3      1      1      3      2
1      2      4      1      2      5      2

# df2 
>>> res

   fk ID  value  valID  fk ID  value  valID
0      1      3      1    1.0    3.0    2.0
1      2      4      1    2.0    5.0    2.0
2      3     10      4    NaN    NaN    NaN

Upvotes: 1

Related Questions