Reputation: 81
I am sorry if this question is already answered, but I did not find any. I want to split & convert long strings in multiple strings I have dataframe df:
no strings
1. A_12_234 gef|re1234|gef|re0943
2. O_257363 tef|fe4545|tef|fe3333|tef|9995
I want to make individual strings and create new column
output I am getting:
no strings new_col
1. A_12_234 gef|re1234|gef|re0943 <thekeys db="gef" value="re1234"/>\n<thekeys db="gef" value="re0943"/>
2. O_257363 tef|fe4545|tef|fe3333|tef|9995 <thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>
Desired output:
no strings new_col
1. A_12_234 gef|re1234|gef|re0943 <thekeys db="gef" value="re1234"/>\n<thekeys db="gef" value="re0943"/>
2. O_257363 tef|fe4545|tef|fe3333|tef|9995 <thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>\n<thekeys db="tef" value="9995"/>
I dont know where I am making a mistake, since it is skipping some pairs
Here's code:
def createxm(x):
try:
parsedlist = x['strings'].split('|')
print(parsedlist)
cnt = len(parsedlist)/2
print(cnt)
xm_list = []
for i in range(0, int(cnt), 2):
xm_list.append('<thekeys db="{}" value="{}"/>'.format(parsedlist[i], parsedlist[i+1]))
xm_string = '\n'.join(xml_list)
return xm_string
except:
return None
Thank you
Upvotes: 1
Views: 59
Reputation: 1614
You were almost there. The problem was in the place where you divide cnt = len(parsedlist/2)
.
Corrected code:
def createxm(x):
try:
parsedlist = x['strings'].split('|')
print(parsedlist)
cnt = len(parsedlist)
print(cnt)
xm_list = []
for i in range(0, int(cnt), 2):
xm_list.append('<thekeys db="{}" value="{}"/>'.format(parsedlist[i], parsedlist[i+1]))
xm_string = '\n'.join(xm_list)
return xm_string
except:
return None
df['new_col'] = df.apply(lambda x:createxm(x), axis=1)
Prints:
df.new_col.iloc[1]
'<thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>\n<thekeys db="tef" value="9995"/>'
Upvotes: 1
Reputation: 18406
Just split the values on |
then use first four values to get the required string, you can use str.format()
fString = '<thekeys db="{}" value="{}"/>\n<thekeys db={} value="{}"/>'
df['strings'].str.split('|').apply(lambda x: fString.format(x[0], x[1], x[2], x[3]))
OUTPUT:
1.0 <thekeys db="gef" value="re1234"/>\n<thekeys d...
2.0 <thekeys db="tef" value="fe4545"/>\n<thekeys d...
Name: strings, dtype: object
Upvotes: 0