rebel095
rebel095

Reputation: 81

Splitting strings from pandas column into multiple strings

I am sorry if this question is already answered, but I did not find any. I want to split & convert long strings in multiple strings I have dataframe df:

       no         strings
1.  A_12_234   gef|re1234|gef|re0943
2.  O_257363   tef|fe4545|tef|fe3333|tef|9995

I want to make individual strings and create new column

output I am getting:

       no         strings                          new_col
1.  A_12_234   gef|re1234|gef|re0943                <thekeys db="gef" value="re1234"/>\n<thekeys db="gef" value="re0943"/>

2.  O_257363   tef|fe4545|tef|fe3333|tef|9995       <thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>

Desired output:

         no         strings                          new_col
1.  A_12_234   gef|re1234|gef|re0943                <thekeys db="gef" value="re1234"/>\n<thekeys db="gef" value="re0943"/>

2.  O_257363   tef|fe4545|tef|fe3333|tef|9995       <thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>\n<thekeys db="tef" value="9995"/>

I dont know where I am making a mistake, since it is skipping some pairs

Here's code:

def createxm(x):
try:
    parsedlist = x['strings'].split('|')
    print(parsedlist)
    cnt = len(parsedlist)/2
    print(cnt)
    xm_list = []
    for i in range(0, int(cnt), 2):
        xm_list.append('<thekeys db="{}" value="{}"/>'.format(parsedlist[i], parsedlist[i+1]))
        xm_string = '\n'.join(xml_list)
    return xm_string
except:
    return None

Thank you

Upvotes: 1

Views: 59

Answers (2)

sharathnatraj
sharathnatraj

Reputation: 1614

You were almost there. The problem was in the place where you divide cnt = len(parsedlist/2).

Corrected code:

def createxm(x):
    try:
        parsedlist = x['strings'].split('|')
        print(parsedlist)
        cnt = len(parsedlist)
        print(cnt)
        xm_list = []
        for i in range(0, int(cnt), 2):
            xm_list.append('<thekeys db="{}" value="{}"/>'.format(parsedlist[i], parsedlist[i+1]))
            xm_string = '\n'.join(xm_list)
        return xm_string
    except:
        return None
df['new_col'] = df.apply(lambda x:createxm(x), axis=1)

Prints:

df.new_col.iloc[1]
'<thekeys db="tef" value="fe4545"/>\n<thekeys db="tef" value="fe3333"/>\n<thekeys db="tef" value="9995"/>'

Upvotes: 1

ThePyGuy
ThePyGuy

Reputation: 18406

Just split the values on | then use first four values to get the required string, you can use str.format()

fString = '<thekeys db="{}" value="{}"/>\n<thekeys db={} value="{}"/>'
df['strings'].str.split('|').apply(lambda x: fString.format(x[0], x[1], x[2],  x[3]))

OUTPUT:

1.0    <thekeys db="gef" value="re1234"/>\n<thekeys d...
2.0    <thekeys db="tef" value="fe4545"/>\n<thekeys d...
Name: strings, dtype: object

Upvotes: 0

Related Questions