Reputation: 181
Good afternoon, i am trying to split text in a column to a specfic format here is my table below
UserId Application
1 Grey Blue::Black Orange;White:Green
2 Yellow Purple::Orange Grey;Blue Pink::Red
I would like it to read the following:
UserId Application
1 Grey Blue
1 White Orange
2 Yellow Purple
2 Blue Pink
Basically, i would like to keep the first string of every :: instance for every string in a given cell.
So far my code is
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')
df['Application']=df.Role.str.split(';|::|').map(lambda x : x[0::2])
unnesting(df.drop('Role',1),['Application']
The following code reads
UserId Application
1 Grey Blue, White Orange
2 Yellow Purple, Blue Pink
Please Assist i dont know where i should be using pandas or numpy to solve this problem!!
Upvotes: 1
Views: 580
Reputation: 323226
Maybe you can try using extractall
yourdf=df.set_index('UserId').Application.str.extractall(r'(\w+):').reset_index(level=0)
# You can adding rename(columns={0:'Application'})at the end
Out[87]:
UserId 0
match
0 1 Grey
1 1 White
0 2 Yellow
1 2 Blue
Update look at the unnesting , after we split
and select the value we need from the string , we store them into a list
, when you have a list
type in you columns
, I recommend using unnesting
df['LIST']=df.Application.str.split(';|::|:').map(lambda x : x[0::2])
unnesting(df.drop('Application',1),['LIST'])
Out[111]:
LIST UserId
0 Grey Blue 1
0 White 1
1 Yellow Purple 2
1 Blue Pink 2
My own def-function
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')
Upvotes: 2