Reputation: 501
I know there have been many questions on this topic, but still:
My input: as dataframe
task m_label
0 S101-10061 [Cecum Landmark, ICV, Comment, Appendiceal ori...
1 S101-10069 [Rectum RF, ICV, Cecum Landmark, TI, Comment, ...
2 S101-10078 [Appendiceal orifice, ICV, Cecum Landmark, Com...
3 S101-10088 [Cecum Landmark, ICV, Comment, Appendiceal ori...
4 S101-10100 [Transverse, Appendiceal orifice, ICV, Cecum L...
5 S101-10102 [Rectum RF, ICV, Cecum Landmark, Comment, TI, ...
6 S101-10133 [Rectum RF, Transverse, ICV, Cecum Landmark, C...
7 S101YGBgZ2 [Comment]
I wan to split like df.m_label.str.split("",expand=True)
but it return NaN
Maybe problem with df? I get its from panda Series from: m_lab_task=data.groupby(['task'])['m_label'].unique()
. So maybe in previously step it possible correct?
Required output:
task m_label1 m_label2 m_label3 m_label4 m_label5 m_label6
0 S101-10061 Cecum Landmark ICV Comment Appendiceal orifice
1 S101-10069 Rectum RF ICV Cecum Landmark TI Comment Transverse
2 S101-10078 Appendiceal orifice ICV Cecum Landmark Comment Transverse
Rectum RF
Upvotes: 0
Views: 86
Reputation: 1139
From Mr.Brarath Narayan code, I hope you can short the code as follows without use of numpy
df = pd.concat([df['task'],df['m_label'].apply(pd.Series).add_prefix("m_label").fillna("")], axis = 1)
Output
task m_label0 m_label1 m_label2 m_label3 m_label4 m_label5 m_label6
0 S101-10061 Cecum Landmark ICV Comment Appendiceal orifice
1 S101-10069 Rectum RF ICV Cecum Landmark TI Comment Transverse
2 S101-10078 Appendiceal orifice ICV Cecum Landmark Comment Transverse Rectum RF
3 S101-10088 Cecum Landmark ICV Comment Appendiceal orifice
4 S101-10100 Transverse Appendiceal orifice ICV Cecum Landmark Comment
5 S101-10102 Rectum RF ICV Cecum Landmark Comment TI Transverse Appendiceal orifice
6 S101-10133 Rectum RF Transverse ICV Cecum Landmark Comment
7 S101YGBgZ2 Comment
Upvotes: 0
Reputation: 130
when you convert the list into dataframe string data without separation will combine as single data to overcome this you have insert the comma before convert into dataframe like this,
import pandas as pd
data={"task":["S101-10061","S101-10069","S101-10078","S101-10088","S101-10100","S101-10102","S101-10133","S101YGBgZ2"],
"m_label":[['Cecum Landmark','ICV' ,'Comment' ,'Appendiceal orifice'],['Rectum RF','ICV','Cecum Landmark','TI','Comment','Transverse']
,['Appendiceal orifice' ,'ICV' ,'Cecum Landmark', 'Comment', 'Transverse','Rectum RF'],['Cecum Landmark', 'ICV', 'Comment', 'Appendiceal orifice'],
['Transverse' ,'Appendiceal orifice', 'ICV', 'Cecum Landmark', 'Comment'],['Rectum RF' ,'ICV' ,'Cecum Landmark', 'Comment' ,'TI' ,'Transverse','Appendiceal orifice'],
['Rectum RF', 'Transverse' ,'ICV' ,'Cecum Landmark', 'Comment'],['Comment']]}
data=pd.DataFrame(data)
dataframe should like this
task m_label
0 S101-10061 [Cecum Landmark, ICV, Comment, Appendiceal ori...
1 S101-10069 [Rectum RF, ICV, Cecum Landmark, TI, Comment, ...
2 S101-10078 [Appendiceal orifice, ICV, Cecum Landmark, Com...
3 S101-10088 [Cecum Landmark, ICV, Comment, Appendiceal ori...
4 S101-10100 [Transverse, Appendiceal orifice, ICV, Cecum L...
5 S101-10102 [Rectum RF, ICV, Cecum Landmark, Comment, TI, ...
6 S101-10133 [Rectum RF, Transverse, ICV, Cecum Landmark, C...
7 S101YGBgZ2 [Comment]
output code
import numpy as np
data=pd.concat([data["task"],data["m_label"].apply(lambda x:pd.Series(x).add_prefix("m_label"))],axis=1).replace(np.nan," ")
task m_label0 m_label1 m_label2 m_label3 m_label4 m_label5 m_label6
0 S101-10061 Cecum Landmark ICV Comment Appendiceal orifice
1 S101-10069 Rectum RF ICV Cecum Landmark TI Comment Transverse
2 S101-10078 Appendiceal orifice ICV Cecum Landmark Comment Transverse Rectum RF
3 S101-10088 Cecum Landmark ICV Comment Appendiceal orifice
4 S101-10100 Transverse Appendiceal orifice ICV Cecum Landmark Comment
5 S101-10102 Rectum RF ICV Cecum Landmark Comment TI Transverse Appendiceal orifice
6 S101-10133 Rectum RF Transverse ICV Cecum Landmark Comment
7 S101YGBgZ2 Comment
Upvotes: 1
Reputation: 18416
Use str.findall
and pass the regex to capture everything enclosed by single quite ''
, then apply pd.Series
to convert them to columns
df=df.set_index('task')['m_label'].str.findall('\'(.*?)\'').apply(pd.Series)
df.columns = [f'm_label{i+1}' for i in df]
OUTPUT:
m_label1 m_label2 m_label3 m_label4 m_label5 m_label6 m_label7
task
S101-10061 Cecum Landmark ICV Comment Appendiceal orifice NaN NaN NaN
S101-10069 Rectum RF ICV Cecum Landmark TI Comment Transverse NaN
S101-10078 Appendiceal orifice ICV Cecum Landmark Comment Transverse Rectum RF NaN
S101-10088 Cecum Landmark ICV Comment Appendiceal orifice NaN NaN NaN
S101-10100 Transverse Appendiceal orifice ICV Cecum Landmark Comment NaN NaN
S101-10102 Rectum RF ICV Cecum Landmark Comment TI Transverse Appendiceal orifice
S101-10133 Rectum RF Transverse ICV Cecum Landmark Comment NaN NaN
S101YGBgZ2 Comment NaN NaN NaN NaN NaN NaN
If needed, you can reset the index later, and fillna('')
.
Upvotes: 2
Reputation: 1858
Just to add something to ThePyGuy's answer,if you want to rename columns "on the fly", you can use add_prefix()
.
df.set_index('task')['m_label'].str.findall('\'(.*?)\'').apply(pd.Series).add_prefix('m_label')
output:
Out[27]:
m_label0 m_label1 ... m_label4 m_label5
task ...
S101-10061 Cecum Landmark ICV ... NaN NaN
S101-10069 Rectum RF ICV ... Comment Transverse
Upvotes: 0