Reputation: 1679
I have a dataframe that looks like this:
time speaker label_1 label_2
0 0.25 1 10 4
1 0.25 2 10 5
2 0.50 1 10 6
3 0.50 2 10 7
4 0.75 1 10 8
5 0.75 2 10 9
6 1.00 1 10 11
7 1.00 2 10 12
8 1.25 1 11 13
9 1.25 2 11 14
10 1.50 1 11 15
11 1.50 2 11 16
12 1.75 1 11 17
13 1.75 2 11 18
14 2.00 1 11 19
15 2.00 2 11 20
The 'speaker' column yields 1 and 2 to delineate 2 speakers at a given timestamp. I want to make new columns from the 'label_1' and 'label_2' data that are associated with only one speaker. See below for desired output.
time spk_1_label_1 spk_2_label1 spk_1_label_2 spk_2_label_2
0.25 10 10 4 5
0.50 10 10 6 7
0.75 10 10 8 9
1.00 10 10 11 12
1.25 11 11 13 14
1.50 11 11 15 16
1.75 11 11 17 18
2.00 11 11 19 20
Upvotes: 1
Views: 1229
Reputation: 42946
First we use pivot_table
to pivot our rows to columns. Then we create our desired column names by string concatenating with list_comprehension
and f-string
:
piv = df.pivot_table(index='time', columns='speaker')
piv.columns = [f'spk_{col[1]}_{col[0]}' for col in piv.columns]
spk_1_label_1 spk_2_label_1 spk_1_label_2 spk_2_label_2
time
0.25 10 10 4 5
0.50 10 10 6 7
0.75 10 10 8 9
1.00 10 10 11 12
1.25 11 11 13 14
1.50 11 11 15 16
1.75 11 11 17 18
2.00 11 11 19 20
If you want to remove the index name:
piv.rename_axis(None, inplace=True)
spk_1_label_1 spk_2_label_1 spk_1_label_2 spk_2_label_2
0.25 10 10 4 5
0.50 10 10 6 7
0.75 10 10 8 9
1.00 10 10 11 12
1.25 11 11 13 14
1.50 11 11 15 16
1.75 11 11 17 18
2.00 11 11 19 20
Extra
If you want, we can make it more general by using the column name as prefix for your flattened columns:
piv.columns = [f'{piv.columns.names[1]}_{col[1]}_{col[0]}' for col in piv.columns]
speaker_1_label_1 speaker_2_label_1 speaker_1_label_2 speaker_2_label_2
time
0.25 10 10 4 5
0.50 10 10 6 7
0.75 10 10 8 9
1.00 10 10 11 12
1.25 11 11 13 14
1.50 11 11 15 16
1.75 11 11 17 18
2.00 11 11 19 20
Notice: if your python version < 3.5, you can't use f-strings
, we can use .format
for our string formatting:
['spk_{}_{}'.format(col[0], col[1]) for col in piv.columns]
Upvotes: 6