Reputation: 141
Suppose I have a pandas DataFrame like this:
import pandas as pd
data = pd.DataFrame({'header': ['age', 'height', 'weight', 'country', 'age', 'height', 'weight', 'bank_id', 'age', 'height', 'weight', 'country'],
'values': ['1', '6 ft', '10 kg', 'India', '2', '5 ft', '20 kg', 'A123', '3', '5.5 ft', '30 kg', 'Japan']})
# display(data)
header values
0 age 1
1 height 6 ft
2 weight 10 kg
3 country India
4 age 2
5 height 5 ft
6 weight 20 kg
7 bank_id A123
8 age 3
9 height 5.5 ft
10 weight 30 kg
11 country Japan
Now I want to transpose them using Python so that they look something like this:
There are some rows with no data, and they will remain blank.
I was trying with the code:
data.pivot_table(columns="header",values="values",aggfunc="max")
[out]:
header age bank_id country height weight
values 3 A123 Japan 6 ft 30 kg
But it is not giving the proper result. It's only showing a single row.
Upvotes: 1
Views: 1151
Reputation: 62403
data
have unique indices.data
.:=
, and is only valid from python 3.8.import pandas as pd
# set up test dataframe
data = pd.DataFrame({'header': ['age', 'height', 'weight', 'country', 'age', 'height', 'weight', 'bank_id', 'age', 'height', 'weight', 'country'],
'values': ['1', '6 ft', '10 kg', 'India', '2', '5 ft', '20 kg', 'A123', '3', '5.5 ft', '30 kg', 'Japan']})
# create a unique index; replace 4 with the real group size
# the associated groups in data, must be consecutive
x = 0
data.index = [x := x+1 if i%4 == 0 else x for i, _ in enumerate(data.index)]
# see the indices are matched for each group compared to the OP
header values
1 age 1
1 height 6 ft
1 weight 10 kg
1 country India
2 age 2
2 height 5 ft
2 weight 20 kg
2 bank_id A123
3 age 3
3 height 5.5 ft
3 weight 30 kg
3 country Japan
# create a wide dataframe
wide = data.pivot(columns='header', values='values').reset_index(drop=True)
# header is the .name of columns, to make it nothing
wide.columns.name = None
# display(wide)
age bank_id country height weight
1 NaN India 6 ft 10 kg
2 A123 NaN 5 ft 20 kg
3 NaN Japan 5.5 ft 30 kg
Upvotes: 3
Reputation: 28644
age
seems to be the starting point of every new data point; we can use that knowledge when reshaping the data, to indicate the start of a new row for pivoting:
(data
.assign(counter = data.header.eq('age').cumsum())
.pivot('counter', 'header', 'values')
.loc[:, ['age', 'height', 'weight', 'country', 'bank_id']]
.rename_axis(index = None, columns = None)
.reset_index(drop=True)
)
age height weight country bank_id
0 1 6 ft 10 kg India NaN
1 2 5 ft 20 kg NaN A123
2 3 5.5 ft 30 kg Japan NaN
Upvotes: 0
Reputation: 65228
One option would be pivoting columns in order to get distinct columns for the new dataframe, and then eliminate NaN values for each column, and lastly combine them by using pandas.concat
function :
import pandas as pd
data = pd.DataFrame({'header': ["age" , "height" , "weight" , "bank_id", "country", "age" , "height" , "weight" , "bank_id", "country", "age" , "height" , "weight" , "country" ],
'values': [ "1", "6 ft", "10 kg", "", "India", "2" , "5 ft" , "20 kg" , "A123", "", "3" , "5.5 ft" , "30 kg" , "Japan" ]})
pvt_data = data.pivot( columns='header', values='values' )
ls_col = list(pvt_data.columns)
ls_cols = []
for col in ls_col:
ls_cols.append(pvt_data[col].dropna().reset_index(drop=True, inplace=False))
print( pd.concat([ls_cols[0],ls_cols[1],ls_cols[2],ls_cols[3],ls_cols[4]],axis=1) )
age bank_id country height weight
0 1 India 6 ft 10 kg
1 2 A123 5 ft 20 kg
2 3 NaN Japan 5.5 ft 30 kg
Upvotes: 1