sharp
sharp

Reputation: 2158

Python splitting column value with special delimeters

I am trying to split a pandas column value with out loosing its deli-meter. Here is the stack-overflow that I am following. It is working well when I pass a string, however it doesn't work when I want it to split by '/m'. I tried different regex, but doesn't seem work either. Any suggestions?

import pandas as pd 
ls = [
    {'ID': 'ABC',
     'LongString': '/m/04abc3 1 1 1 1 /m/04ccc32 3 3 3 3'},
    {'ID': 'CDE',
     'LongString': '/m/04abc4 2 2 2 2 /m/04ccc12 4 4 4 4'}
]

df = pd.DataFrame(ls)

df['LongString'] = df['LongString'].str.split('(?<=/m)\s') # tried removing `/` and put in `m` for testing. Did not do the trick. 

I am trying to get it to look like this. What am I doing wrong here?

pandas dataframe format: 
ID  | LongString
ABC | ['/m/04abc3 1 1 1 1', '/m/04ccc32 3 3 3 3']
CDE | ['/m/04abc4 2 2 2 2', '/m/04ccc12 4 4 4 4']

Upvotes: 1

Views: 38

Answers (1)

JuliettVictor
JuliettVictor

Reputation: 654

It looks as if you want to split on a white space followed by /m. In regex language, you want a lookahead rather than a lookbehind.

Proposed solution:

df['LongString'] = df['LongString'].str.split('\s(?=/m)')

Upvotes: 3

Related Questions