tarik
tarik

Reputation: 49

How to split a dataframe string column into multiple columns?

I have a pandas dataframe. This dataframe consists of a single column. I want to parse this column according to the '&' sign and add the data to the right of the "=" sign as a new column. Examples are below.

The dataframe I have;

                               tags
0                 letter1=A&letter2=B&letter3=C
1                 letter1=D&letter2=E&letter3=F
2                 letter1=G&letter2=H&letter3=I
3                 letter1=J&letter2=K&letter3=L
4                 letter1=M&letter2=N&letter3=O
5                 letter1=P&letter2=R&letter3=S
.                  .
.                  .

dataframe that I want to convert;

     letter1    letter2   letter3
0      A           B         C
1      D           E         F
2      G           H         I
3      J           K         L
4      M           N         O
.
.

I tried to do something with this code snippet.

columnname= df["tags"][0].split("&")[i].split("=")[0]
value =df["tags"][0].split("&")[i].split("=")[1]

But I'm not sure how I can do it for the whole dataframe. I am looking for a faster and stable way.

Thanks in advance,

Upvotes: 2

Views: 315

Answers (4)

sammywemmy
sammywemmy

Reputation: 28644

Split into separate columns, via str.split, using & :

step1 = df.tags.str.split("&", expand=True)

Get the new columns from the first row of step1:

new_columns = step1.loc[0, :].str[:-2].array

Get rid of the letter1= prefix in each column, set the new_columns as the header:

step1.set_axis(new_columns, axis='columns').transform(lambda col: col.str[-1])

  letter1 letter2 letter3
0       A       B       C
1       D       E       F
2       G       H       I
3       J       K       L
4       M       N       O
5       P       R       S

Upvotes: 1

rpb
rpb

Reputation: 3299

Using regex

import pandas as pd
import re
tags = [
    "letter1=A&letter2=B&letter3=C",
    "letter1=D&letter2=E&letter3=F",
    "letter1=G&letter2=H&letter3=I",
    "letter1=J&letter2=K&letter3=L",
    "letter1=M&letter2=N&letter3=O",
    "letter1=P&letter2=R&letter3=S"
]

df = pd.DataFrame({"tags": tags})
pattern=re.compile("\=(\w+)") # Look for pattern
df['letter1'], df['letter3'],df["letter2"] = zip(*df["tags"].apply(lambda x: pattern.findall(x)))

Output

                            tags letter1 letter2 letter3
0  letter1=A&letter2=B&letter3=C       A       B       C
1  letter1=D&letter2=E&letter3=F       D       E       F
2  letter1=G&letter2=H&letter3=I       G       H       I
3  letter1=J&letter2=K&letter3=L       J       K       L
4  letter1=M&letter2=N&letter3=O       M       N       O
5  letter1=P&letter2=R&letter3=S       P       R       S

Upvotes: 0

Joseph Konka
Joseph Konka

Reputation: 111

do this..

import pandas as pd

tags = [
    "letter1=A&letter2=B&letter3=C",
    "letter1=D&letter2=E&letter3=F",
    "letter1=G&letter2=H&letter3=I",
    "letter1=J&letter2=K&letter3=L",
    "letter1=M&letter2=N&letter3=O",
    "letter1=P&letter2=R&letter3=S"
]
df = pd.DataFrame({"tags": tags})

df["letter1"] = df["tags"].apply(lambda x: x.split("&")[0].split("=")[-1])
df["letter2"] = df["tags"].apply(lambda x: x.split("&")[1].split("=")[-1])
df["letter3"] = df["tags"].apply(lambda x: x.split("&")[2].split("=")[-1])
df = df[["letter1", "letter2", "letter3"]]
df

enter image description here

Upvotes: 3

Tanmay Shrivastava
Tanmay Shrivastava

Reputation: 579

d=list(df["tags"])
r={}
for i in d:                                            
    for ele in i.split("&"):                           
        if ele.split("=")[0] in r.keys():              
            r[ele.split("=")[0]].append(ele.split("=")[1])
        else:
            r[ele.split("=")[0]]=[]
            r[ele.split("=")[0]].append(ele.split("=")[1])

df = pd.DataFrame({i:pd.Series(r[i]) for i in r})
print (df)

Upvotes: 0

Related Questions