How to create columns from a string in a dataframe?

Question

WHAT I HAVE:

import pandas as pd
inp = [{'long string':'ha: (tra: 1 la: 2) 
 hi: (tra: 1 la: 2) 
 ho: (tra: 1 la: 2)'}, 
{'long string':'hi: (tra: 1 la: 2) 
 ha: (tra: 1 la: 2) 
 ho: (tra: 1 la: 2)'}, 
{'long string':'ho: (tra: 1 la: 2) 
 hi: (tra: 1 la: 2) 
 ha: (tra: 1 la: 2)'}]
df = pd.DataFrame(inp)
df

GIVES

    long string
0   ha: (tra: 1 la: 2) 
 hi: (tra: 1 la: 2) 
 ho...
1   hi: (tra: 1 la: 2) 
 ha: (tra: 1 la: 2) 
 ho...
2   ho: (tra: 1 la: 2) 
 hi: (tra: 1 la: 2) 
 ha...

WHAT I WANT

inp = {'ha-tra':['1', '1', '1'], 'ha-la':['2', '2', '2'], 'hi-tra':['1', '1', '1'], 'hi-la':['2', '2', '2'],'ho-tra':['1', '1', '1'], 'ho-la':['2', '2', '2']}
df = pd.DataFrame(inp)
df

GIVES

    ha-tra  ha-la   hi-tra  hi-la   ho-tra  ho-la
0   1       2       1       2       1       2
1   1       2       1       2       1       2
2   1       2       1       2       1       2

CONTEXT

From a large string, I want to get each combination of (ha hi ho) and (tra la), and get the scores related to those combinations from the string. The problem is that the order of (ha hi ho) is not similar.

Mustafa Aydın · Accepted Answer

ndf = (df["long string"]
         .str.extractall(r"(ha|hi|ho):\s$(?:tra|la):\s(\d+)\s(?:tra|la):\s(\d+)$")
         .droplevel("match")
         .set_index(0, append=True)
         .set_axis(["tra", "la"], axis=1)
         .unstack()
         .swaplevel(axis=1))
ndf.columns = ndf.columns.map("-".join)

Extract the desired parts with a regex
Drop the index level induced by extractall called match
Append the ha-hi-ho matches as the index (0 is first capturing group)
Rename the remaining columns tra and la
Unstack the ha-hi-ho index to the columns
Swap the ha-hi-ho and tra-la levels' order in columns so that ha-hi-ho is upper
Lastly join these levels of columns' names with a hyphen

to get

  ha-tra hi-tra ho-tra ha-la hi-la ho-la
0      1      1      1     2     2     2
1      1      1      1     2     2     2
2      1      1      1     2     2     2

How to create columns from a string in a dataframe?

Answers (2)

Related Questions