Reputation: 2764
I have below pandas
data frame and I am trying to split col1
into multiple columns based on split_format
string.
Inputs:
split_format = 'id-id1_id2|id3'
data = {'col1':['a-a1_a2|a3', 'b-b1_b2|b3', 'c-c1_c2|c3', 'd-d1_d2|d3'],
'col2':[20, 21, 19, 18]}
df = pd.DataFrame(data).style.hide_index()
df
col1 col2
a-a1_a2|a3 20
b-b1_b2|b3 21
c-c1_c2|c3 19
d-d1_d2|d3 18
Expected Output:
id id1 id2 id3 col2
a a1 a2 a3 20
b b1 b2 b3 21
c c1 c2 c3 19
d d1 d2 d3 18
**Note: The special characters and column name in split_string
can be changed.
Upvotes: 0
Views: 476
Reputation: 4243
I parse the symbols and then recursively evaluate the resulting strings from the token split on the string. I flatten the resulting list and their recursive evaluate the resulting list until all the symbols have been evaluated.
split_format = 'id-id1_id2|id3'
data = {'col1':['a-a1_a2|a3', 'b-b1_b2|b3', 'c-c1_c2|c3', 'd-d1_d2|d3'],
'col2':[20, 21, 19, 18]}
df = pd.DataFrame(data)
symbols=[]
for x in split_format:
if x.isalnum()==False:
symbols.append(x)
result=[]
def parseTree(stringlist,symbols,result):
#print("String list",stringlist)
if len(symbols)==0:
[result.append(x) for x in stringlist]
return
token=symbols.pop(0)
elements=[]
for item in stringlist:
elements.append(item.split(token))
flat_list = [item for sublist in elements for item in sublist]
parseTree(flat_list,symbols,result)
df2=pd.DataFrame(columns=["id","id1","id2","id3"])
for key, item in df.iterrows():
symbols2=symbols.copy()
value=item['col1']
parseTree([value],symbols2,result)
a_series = pd. Series(result, index = df2.columns)
df2=df2.append(a_series, ignore_index=True)
result.clear()
df2['col2']=df['col2']
print(df2)
output:
id id1 id2 id3 col2
0 a a1 a2 a3 20
1 b b1 b2 b3 21
2 c c1 c2 c3 19
3 d d1 d2 d3 18
Upvotes: 1
Reputation: 2764
I think I am able to figure it out.
col_name = re.split('[^0-9a-zA-Z]+',split_format)
df[col_name] = df['col1'].str.split('[^0-9a-zA-Z]+',expand=True)
del df['col1']
df
col2 id id1 id2 id3
0 20 a a1 a2 a3
1 21 b b1 b2 b3
2 19 c c1 c2 c3
3 18 d d1 d2 d3
Upvotes: 2