Shubham R
Shubham R

Reputation: 7644

Split DataFrame string column into N splits

i have a df

a    name
1    a/b/c
2    w/x/y/z
3    q/w/e/r/t

i want to split the name column on '/' to get this output

id    name    main    sub    leaf
1     a/b/c    a       b       c
2    w/x/y/z   w       x       z
3   q/w/e/r/t  q       w       t 

i.e first two slashes add as main and sub respectively, and leaf should be filled with word after last slash

i tried using this, but result was incorrect

df['name'].str.split('/', expand=True).rename(columns={0:'main',1:'sub',2:'leaf'})

is there any way to assign columns

Upvotes: 1

Views: 465

Answers (2)

cs95
cs95

Reputation: 402323

Option 1
Using str.split, but don't expand the result. You should end up with a column of lists. Next, use df.assign, assign columns to return a new DataFrame object.

v = df['name'].str.split('/')

df.assign(
    main=v.str[ 0],
     sub=v.str[ 1],
    leaf=v.str[-1]
)

        name leaf main sub
a                         
1      a/b/c    c    a   b
2    w/x/y/z    z    w   x
3  q/w/e/r/t    t    q   w

Details
This is what v looks like:

a
1          [a, b, c]
2       [w, x, y, z]
3    [q, w, e, r, t]
Name: name, dtype: object

This is actually a lot easier to handle, because you have greater control over elements with the .str accessor. If you expand the result, you have to snap your ragged data to a tabular format to fit into a new DataFrame object, thereby introducing Nones. At that point, indexing (finding the ith or ith-last element) becomes a chore.


Option 2
Using direct assignment (to maintain order) -

df['main'] = v.str[ 0]
df['sub' ] = v.str[ 1]
df['leaf'] = v.str[-1]

df    
        name main sub leaf
a                         
1      a/b/c    a   b    c
2    w/x/y/z    w   x    z
3  q/w/e/r/t    q   w    t

Note that this modifies the original dataframe, instead of returning a new one, so it is cheaper. However, it is more intractable if you have a large number of columns.

You might instead consider this alternative which should generalise to many more columns:

for c, i in [('main', 0), ('sub', 1), ('leaf', -1)]:
    df[c] = v[i]

df
        name main sub leaf
a                         
1      a/b/c    a   b    c
2    w/x/y/z    w   x    z
3  q/w/e/r/t    q   w    t

Iterate over a list of tuples. The first element in a tuple is the column name, and the second is the corresponding index to pick the result from v. You still have to assign each one separately, whether you like it or not. Using a loop would probably be a clean way of doing it.

Upvotes: 2

jezrael
jezrael

Reputation: 862511

Use split with assign:

s = df['name'].str.split('/')
df = df.assign(main=s.str[0], sub=s.str[1], leaf=s.str[-1])
print (df)
   a       name leaf main sub
0  1      a/b/c    c    a   b
1  2    w/x/y/z    z    w   x
2  3  q/w/e/r/t    t    q   w

For change order of columns:

s = df['name'].str.split('/')
df = df.assign(main=s.str[0], sub=s.str[1], leaf=s.str[-1])
df = df[df.columns[:-3].tolist() + ['main','sub','leaf']]
print (df)
   a       name main sub leaf
0  1      a/b/c    a   b    c
1  2    w/x/y/z    w   x    z
2  3  q/w/e/r/t    q   w    t

Or:

s = df['name'].str.split('/')
df = (df.join(pd.DataFrame({'main':s.str[0], 'sub':s.str[1], 'leaf':s.str[-1]}, 
              columns=['main','sub','leaf'])))
print (df)
   a       name main sub leaf
0  1      a/b/c    a   b    c
1  2    w/x/y/z    w   x    z
2  3  q/w/e/r/t    q   w    t

Upvotes: 3

Related Questions