Reputation: 7644
i have a df
a name
1 a/b/c
2 w/x/y/z
3 q/w/e/r/t
i want to split the name column on '/' to get this output
id name main sub leaf
1 a/b/c a b c
2 w/x/y/z w x z
3 q/w/e/r/t q w t
i.e first two slashes add as main and sub respectively, and leaf should be filled with word after last slash
i tried using this, but result was incorrect
df['name'].str.split('/', expand=True).rename(columns={0:'main',1:'sub',2:'leaf'})
is there any way to assign columns
Upvotes: 1
Views: 465
Reputation: 402323
Option 1
Using str.split
, but don't expand the result. You should end up with a column of lists. Next, use df.assign
, assign columns to return a new DataFrame object.
v = df['name'].str.split('/')
df.assign(
main=v.str[ 0],
sub=v.str[ 1],
leaf=v.str[-1]
)
name leaf main sub
a
1 a/b/c c a b
2 w/x/y/z z w x
3 q/w/e/r/t t q w
Details
This is what v
looks like:
a
1 [a, b, c]
2 [w, x, y, z]
3 [q, w, e, r, t]
Name: name, dtype: object
This is actually a lot easier to handle, because you have greater control over elements with the .str
accessor. If you expand the result, you have to snap your ragged data to a tabular format to fit into a new DataFrame object, thereby introducing None
s. At that point, indexing (finding the ith or ith-last element) becomes a chore.
Option 2
Using direct assignment (to maintain order) -
df['main'] = v.str[ 0]
df['sub' ] = v.str[ 1]
df['leaf'] = v.str[-1]
df
name main sub leaf
a
1 a/b/c a b c
2 w/x/y/z w x z
3 q/w/e/r/t q w t
Note that this modifies the original dataframe, instead of returning a new one, so it is cheaper. However, it is more intractable if you have a large number of columns.
You might instead consider this alternative which should generalise to many more columns:
for c, i in [('main', 0), ('sub', 1), ('leaf', -1)]:
df[c] = v[i]
df
name main sub leaf
a
1 a/b/c a b c
2 w/x/y/z w x z
3 q/w/e/r/t q w t
Iterate over a list of tuples. The first element in a tuple is the column name, and the second is the corresponding index to pick the result from v
. You still have to assign each one separately, whether you like it or not. Using a loop would probably be a clean way of doing it.
Upvotes: 2
Reputation: 862511
s = df['name'].str.split('/')
df = df.assign(main=s.str[0], sub=s.str[1], leaf=s.str[-1])
print (df)
a name leaf main sub
0 1 a/b/c c a b
1 2 w/x/y/z z w x
2 3 q/w/e/r/t t q w
For change order of columns:
s = df['name'].str.split('/')
df = df.assign(main=s.str[0], sub=s.str[1], leaf=s.str[-1])
df = df[df.columns[:-3].tolist() + ['main','sub','leaf']]
print (df)
a name main sub leaf
0 1 a/b/c a b c
1 2 w/x/y/z w x z
2 3 q/w/e/r/t q w t
Or:
s = df['name'].str.split('/')
df = (df.join(pd.DataFrame({'main':s.str[0], 'sub':s.str[1], 'leaf':s.str[-1]},
columns=['main','sub','leaf'])))
print (df)
a name main sub leaf
0 1 a/b/c a b c
1 2 w/x/y/z w x z
2 3 q/w/e/r/t q w t
Upvotes: 3