Reputation: 173
i have a problem with splitting a column into two columns. I want to split the column by the first and last found pattern '-'. Maybe this is trivial.
Here is my column:
col1
0 aa-bb-cc-dd
1 aa-bb-cc
2 aa-bb-cc
3 aa-bb-cc-dd
This is the frame i want as result:
col1 col2
0 bb-cc dd
1 bb cc
2 bb cc
3 bb-cc dd
Thanks in advance!
Upvotes: 2
Views: 384
Reputation: 249434
Here's an idiomatic but slow way to do it:
df.col1 = df.col1.str.split('-', 1).str[1] # discard first part
parts = df.col1.str.rsplit('-', 1).str
df.col1 = parts[0]
df['col2'] = parts[1]
While this works, it is not fast: about 4 seconds for 700k rows. Looking at it you'd think this is a good way to do it, but performance-wise it's worse than all the alternatives.
Upvotes: 2
Reputation: 562
This might help:
df['col2'] = df['col1'].split('-')[-1]
df['col1'] = '-'.join(i for i in df['col1'].split('-')[1:-1])
Upvotes: -1
Reputation: 13255
First slice and use str.rsplit
and rename
:
df = df.col1.str[3:].str.rsplit('-', n=1, expand=True).rename(columns={0:'col1',1:'col2'})
print(df)
col1 col2
0 bb-cc dd
1 bb cc
2 bb cc
3 bb-cc dd
Upvotes: 1
Reputation: 164773
You can use a list comprehension:
df = pd.DataFrame([i.split('-', 1)[1].rsplit('-', 1) for i in df['col1']],
columns=['col1', 'col2'])
print(df)
col1 col2
0 bb-cc dd
1 bb cc
2 bb cc
3 bb-cc dd
Pandas str
methods exist primarily for convenience. For clean data, you may find the list comprehension more efficient for larger dataframes.
Upvotes: 3
Reputation: 1582
If I understand well your question, you need to get rid of the first block delimited by a '-', then split the last '-' block in col2. If that is what you need, you could consider this:
df= pd.DataFrame({'col1':['aa-bb-cc-dd', 'aa-bb-cc', 'aa-bb-cc', 'aa-bb-cc-dd']})
df['col2'] = df['col1'].apply(lambda x: x[x.rfind('-')+1:])
df['col1'] = df['col1'].apply(lambda x: x[x.find('-')+1:x.rfind('-')])
print (df)
Upvotes: 1