savchart
savchart

Reputation: 169

Reshape, recombine row with column

I have dataset like this:

Block   Vector
blk_-1  0.0 2 3, 0.5 3 8, 0.7 33 5
blk_-2  1.0 4 1, 2.0 2 4
blk_-3  0.0 0 0, 6.0 0 7
blk_-4  8.0 3 0, 7.0 5 8
blk_-5  9.0 0 5, 5.0 0 2, 5.2 3 2, 5.9 5 3

dat = {'Block': ['blk_-1', 'blk_-2', 'blk_-3', 'blk_-4',  'blk_-5'],\
        'Vector': ['0.0 2 3, 0.5 3 8, 0.7 33 5',\
                   '1.0 4 1, 2.0 2 4',\
                   '0.0 0 0, 6.0 0 7',\
                   '8.0 3 0, 7.0 5 8',\
                  '9.0 0 5, 5.0 0 2, 5.2 3 2, 5.9 5 3']
       }

I want to get:

Block   Vector

blk_-1   0.0 2 3

blk_-1   0.5 3 8

blk_-1   0.7 33 5

blk_-2   1.0 4 1

blk_-2   2.0 2 4

blk_-3   0.0 0 0

blk_-3   6.0 0 7

blk_-4   8.0 3 0

blk_-4   7.0 5 8

blk_-5   9.0 0 5

blk_-5   5.0 0 2

blk_-5   5.2 3 2    

blk_-5   5.9 5 3

Try:

df['Vector'] = df['Vector'].apply(lambda x : list(map(str, x.split(','))))
df.Vector.apply(pd.Series) \
    .merge(df, left_index = True, right_index = True) \
    .drop(["Vector"], axis = 1)

Get:

    0   1   2   3                   Block
0   0.0 2 3 0.5 3 8 0.7 33 5    NaN blk_-1
1   1.0 4 1 2.0 2 4 NaN NaN         blk_-2
2   0.0 0 0 6.0 0 7 NaN NaN         blk_-3
3   8.0 3 0 7.0 5 8 NaN NaN         blk_-4
4   9.0 0 5 5.0 0 2 5.2 3 2 5.9 5 3         blk_-5

Actually Stuck at this moment. Waiting for your ideas and comments :)

Upvotes: 1

Views: 63

Answers (3)

jezrael
jezrael

Reputation: 863166

Solution for pandas 0.25+ - Series.str.split column and assign back with DataFrame.assign, use DataFrame.explode and last for default index add DataFrame.reset_index with drop=True:

df = pd.DataFrame(dat)

df = df.assign(Vector=df['Vector'].str.split(',')).explode('Vector').reset_index(drop=True)
print (df)
     Block     Vector
0   blk_-1    0.0 2 3
1   blk_-1    0.5 3 8
2   blk_-1   0.7 33 5
3   blk_-2    1.0 4 1
4   blk_-2    2.0 2 4
5   blk_-3    0.0 0 0
6   blk_-3    6.0 0 7
7   blk_-4    8.0 3 0
8   blk_-4    7.0 5 8
9   blk_-5    9.0 0 5
10  blk_-5    5.0 0 2
11  blk_-5    5.2 3 2
12  blk_-5    5.9 5 3

Version for oldier pandas versions - use pop + split + stack + reset_index + rename for new Series and then join to original:

df = (df.join(df.pop('Vector')
                .str.split(',',expand=True)
                .stack()
                .reset_index(level=1, drop=True)
                .rename('Vector')).reset_index(drop=True))
print (df)
     Block     Vector
0   blk_-1    0.0 2 3
1   blk_-1    0.5 3 8
2   blk_-1   0.7 33 5
3   blk_-2    1.0 4 1
4   blk_-2    2.0 2 4
5   blk_-3    0.0 0 0
6   blk_-3    6.0 0 7
7   blk_-4    8.0 3 0
8   blk_-4    7.0 5 8
9   blk_-5    9.0 0 5
10  blk_-5    5.0 0 2
11  blk_-5    5.2 3 2
12  blk_-5    5.9 5 3

Upvotes: 2

anky
anky

Reputation: 75100

For lower than version .25:

final=df.merge(df['Vector'].str.split(',',expand=True).stack().reset_index(0,name='Vector'),
left_index=True,right_on='level_0',suffixes=('_x','')).drop(['level_0','Vector_x'],1)
print(final)

    Block     Vector
0  blk_-1    0.0 2 3
1  blk_-1    0.5 3 8
2  blk_-1   0.7 33 5
0  blk_-2    1.0 4 1
1  blk_-2    2.0 2 4
0  blk_-3    0.0 0 0
1  blk_-3    6.0 0 7
0  blk_-4    8.0 3 0
1  blk_-4    7.0 5 8
0  blk_-5    9.0 0 5
1  blk_-5    5.0 0 2
2  blk_-5    5.2 3 2
3  blk_-5    5.9 5 3

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19957

You can use split, explode and join.

df[['Block']].join(df.Vector.str.split(',').explode())

    Block   Vector
0   blk_-1  0.0 2 3
0   blk_-1  0.5 3 8
0   blk_-1  0.7 33 5
1   blk_-2  1.0 4 1
1   blk_-2  2.0 2 4
2   blk_-3  0.0 0 0
2   blk_-3  6.0 0 7
3   blk_-4  8.0 3 0
3   blk_-4  7.0 5 8
4   blk_-5  9.0 0 5
4   blk_-5  5.0 0 2
4   blk_-5  5.2 3 2
4   blk_-5  5.9 5 3

Upvotes: 3

Related Questions