Reputation: 431
I have a dataframe, which I need to split a column on character "Y" and keep this deliminator. For example,
import pandas as pd
d1 = pd.DataFrame({'user': [1,2,3],'action': ['YNY','NN','NYYN']})
The output dataframe should look like this,
d2 = pd.DataFrame([{'action': 'Y, NY', 'user': 1},
{'action': 'NN', 'user': 2},
{'action': 'NY, Y, N', 'user': 3}])
in[1]: d1
out[1]: action user
YNY 1
NN 2
NYYN 3
in[2]: d2
out[2]: action user
Y,NY 1
NN 2
NY,Y, N 3
I have tried a few questions such as Python split() without removing the delimiter and Python splitting on regex without removing delimiters. But they are not exactly what I am looking for here.
Upvotes: 0
Views: 50
Reputation: 9081
Use -
d1['action'].str.split('Y').str.join('Y,').str.rstrip(',')
Output
0 Y,NY
1 NN
2 NY,Y,N
Upvotes: 1
Reputation: 323226
Sounds like you need
d1.action.str.split('([^Y]*Y)').map(lambda x : [z for z in x if z!= ''])
Out[234]:
0 [Y, NY]
1 [NN]
2 [NY, Y, N]
Name: action, dtype: object
Upvotes: 1