How to use multiple separators in a pandas Series and split into multiple rows

Question

I have a dataframe like this.

df = pd.DataFrame({
    "Name" : ["ABC LLC Ram corp", "IJK Inc"],
    "id" : [101, 102]
 })

    Name                id
0 ABC LLC Ram corp      101
1 IJK Inc               102

I am trying to split the Name series into multiple rows based on my separator. I am able to split but unable to retain the separators too.

separators = ["inc","corp","llc"]

My expected output is,

Name       id
ABC LLC    101
RAM corp   101
IJK Inc    102

Please help, thanks.

Shubham Sharma · Accepted Answer

You can use str.findall to find all the occurrence of matching regex pattern in column Name, then assign these matching occurrences to the column Name and explode the dataframe on Name:

pat = fr"(?i)(.*?(?:{'|'.join(separators)}))"
df.assign(Name=df['Name'].str.findall(pat)).explode('Name')

Regex details:

(?i) : Case insensitive flag
( : Start of capturing group
.*? : Matches any character except line terminators between zero and unlimited times, as few times as possible (lazy match).
(?: : start of a non capturing group
{'|'.join(separators)}: f-string expression which evaluates to inc|corp|llc
) : End of non-capturing group
) : End of capturing group

        Name   id
0    ABC LLC  101
0   Ram corp  101
1    IJK Inc  102

How to use multiple separators in a pandas Series and split into multiple rows

Answers (2)

Related Questions