Reputation: 169
I have dataset like this:
Block Vector
blk_-1 0.0 2 3, 0.5 3 8, 0.7 33 5
blk_-2 1.0 4 1, 2.0 2 4
blk_-3 0.0 0 0, 6.0 0 7
blk_-4 8.0 3 0, 7.0 5 8
blk_-5 9.0 0 5, 5.0 0 2, 5.2 3 2, 5.9 5 3
dat = {'Block': ['blk_-1', 'blk_-2', 'blk_-3', 'blk_-4', 'blk_-5'],\
'Vector': ['0.0 2 3, 0.5 3 8, 0.7 33 5',\
'1.0 4 1, 2.0 2 4',\
'0.0 0 0, 6.0 0 7',\
'8.0 3 0, 7.0 5 8',\
'9.0 0 5, 5.0 0 2, 5.2 3 2, 5.9 5 3']
}
I want to get:
Block Vector
blk_-1 0.0 2 3
blk_-1 0.5 3 8
blk_-1 0.7 33 5
blk_-2 1.0 4 1
blk_-2 2.0 2 4
blk_-3 0.0 0 0
blk_-3 6.0 0 7
blk_-4 8.0 3 0
blk_-4 7.0 5 8
blk_-5 9.0 0 5
blk_-5 5.0 0 2
blk_-5 5.2 3 2
blk_-5 5.9 5 3
Try:
df['Vector'] = df['Vector'].apply(lambda x : list(map(str, x.split(','))))
df.Vector.apply(pd.Series) \
.merge(df, left_index = True, right_index = True) \
.drop(["Vector"], axis = 1)
Get:
0 1 2 3 Block
0 0.0 2 3 0.5 3 8 0.7 33 5 NaN blk_-1
1 1.0 4 1 2.0 2 4 NaN NaN blk_-2
2 0.0 0 0 6.0 0 7 NaN NaN blk_-3
3 8.0 3 0 7.0 5 8 NaN NaN blk_-4
4 9.0 0 5 5.0 0 2 5.2 3 2 5.9 5 3 blk_-5
Actually Stuck at this moment. Waiting for your ideas and comments :)
Upvotes: 1
Views: 63
Reputation: 863166
Solution for pandas 0.25+ - Series.str.split
column and assign back with DataFrame.assign
, use DataFrame.explode
and last for default index add DataFrame.reset_index
with drop=True
:
df = pd.DataFrame(dat)
df = df.assign(Vector=df['Vector'].str.split(',')).explode('Vector').reset_index(drop=True)
print (df)
Block Vector
0 blk_-1 0.0 2 3
1 blk_-1 0.5 3 8
2 blk_-1 0.7 33 5
3 blk_-2 1.0 4 1
4 blk_-2 2.0 2 4
5 blk_-3 0.0 0 0
6 blk_-3 6.0 0 7
7 blk_-4 8.0 3 0
8 blk_-4 7.0 5 8
9 blk_-5 9.0 0 5
10 blk_-5 5.0 0 2
11 blk_-5 5.2 3 2
12 blk_-5 5.9 5 3
Version for oldier pandas versions - use pop
+ split
+ stack
+ reset_index
+ rename
for new Series
and then join
to original:
df = (df.join(df.pop('Vector')
.str.split(',',expand=True)
.stack()
.reset_index(level=1, drop=True)
.rename('Vector')).reset_index(drop=True))
print (df)
Block Vector
0 blk_-1 0.0 2 3
1 blk_-1 0.5 3 8
2 blk_-1 0.7 33 5
3 blk_-2 1.0 4 1
4 blk_-2 2.0 2 4
5 blk_-3 0.0 0 0
6 blk_-3 6.0 0 7
7 blk_-4 8.0 3 0
8 blk_-4 7.0 5 8
9 blk_-5 9.0 0 5
10 blk_-5 5.0 0 2
11 blk_-5 5.2 3 2
12 blk_-5 5.9 5 3
Upvotes: 2
Reputation: 75100
For lower than version .25
:
final=df.merge(df['Vector'].str.split(',',expand=True).stack().reset_index(0,name='Vector'),
left_index=True,right_on='level_0',suffixes=('_x','')).drop(['level_0','Vector_x'],1)
print(final)
Block Vector
0 blk_-1 0.0 2 3
1 blk_-1 0.5 3 8
2 blk_-1 0.7 33 5
0 blk_-2 1.0 4 1
1 blk_-2 2.0 2 4
0 blk_-3 0.0 0 0
1 blk_-3 6.0 0 7
0 blk_-4 8.0 3 0
1 blk_-4 7.0 5 8
0 blk_-5 9.0 0 5
1 blk_-5 5.0 0 2
2 blk_-5 5.2 3 2
3 blk_-5 5.9 5 3
Upvotes: 1
Reputation: 19957
You can use split, explode and join.
df[['Block']].join(df.Vector.str.split(',').explode())
Block Vector
0 blk_-1 0.0 2 3
0 blk_-1 0.5 3 8
0 blk_-1 0.7 33 5
1 blk_-2 1.0 4 1
1 blk_-2 2.0 2 4
2 blk_-3 0.0 0 0
2 blk_-3 6.0 0 7
3 blk_-4 8.0 3 0
3 blk_-4 7.0 5 8
4 blk_-5 9.0 0 5
4 blk_-5 5.0 0 2
4 blk_-5 5.2 3 2
4 blk_-5 5.9 5 3
Upvotes: 3