Reputation: 25
I have a column like this
col1
(0,0):19,(1,1):155
(0,0):13,(0,1):32,(1,1):34
(0,0):56,(1,0):43,(1,1):38
(0,0):67,(0,1):45,(1,0):98,(1,1):70
and I would like to split it into four columns:
(0,0) (0,1) (1,0) (1,1)
19 0 0 155
13 32 0 34
56 0 43 38
67 45 98 70
I have try col.str.split(',\s*(?![^()]*\))',expand=True)
but can only get:
col1 col2 col3 col4
(0,0):19 (1,1):155 NAN NAN
(0,0):13 (0,1):32 (1,1):34 NAN
(0,0):56 (1,0):43 (1,1):38 NAN
(0,0):67 (0,1):45 (1,0):98 (1,1):70
I was wondering if there is a way get what I show above? Thanks a lot!
Kun
Upvotes: 2
Views: 52
Reputation: 150765
My approach with double str.split
and stack/unstack
:
(df.col1.str.split(',\s*(?![^()]*\))', expand=True)
.stack().str.split(':',expand=True)
.set_index([0], append=True)
.reset_index(level=1, drop=True)[1]
.unstack(fill_value=0)
)
Output:
0 (0,0) (0,1) (1,0) (1,1)
0 19 0 0 155
1 13 32 0 34
2 56 0 43 38
3 67 45 98 70
Upvotes: 1
Reputation: 626950
You may extract the digits after (0,0)
, (0,1)
, (1,0)
and (1,1)
and then drop the col1
column:
import pandas as pd
df = pd.DataFrame({'col1': ['(0,0):19,(1,1):155','(0,0):13,(0,1):32,(1,1):34','(0,0):56,(1,0):43,(1,1):38','(0,0):67,(0,1):45,(1,0):98,(1,1):70']})
df['(0,0)'] = df['col1'].str.extract(r'\(0,0\):(\d+)', expand=False).fillna(0)
df['(0,1)'] = df['col1'].str.extract(r'\(0,1\):(\d+)', expand=False).fillna(0)
df['(1,0)'] = df['col1'].str.extract(r'\(1,0\):(\d+)', expand=False).fillna(0)
df['(1,1)'] = df['col1'].str.extract(r'\(1,1\):(\d+)', expand=False).fillna(0)
df = df.drop(columns = ['col1'])
Output:
>>> df
(0,0) (0,1) (1,0) (1,1)
0 19 0 0 155
1 13 32 0 34
2 56 0 43 38
3 67 45 98 70
Upvotes: 1