Reputation: 549
I'm trying to split a column that has a specific delimiter like: '|'.
My data looks like this, I have ONLY ONE COLUMN named "ID" that contains those strings that I want to split based on delimiter " | "
ID accountsummary | Name: Report Suite Totals
ID activity | Name: Activity
I've tried with 2 different approaches:
dataframe_elements_int[['ID', 'Name']] = \
dataframe_elements_int['ID'].str.rsplit('|', expand=True, n=1)
Which gives me the following error: ValueError: Columns must be same length as key
dataframe_final[['Id','Name']] = \
dataframe_elements_int['ID'].str.extract('(\w*)\|(\w*)', expand=True)
Which gives me the following error: ValueError: pattern contains no capture groups
Upvotes: 2
Views: 3716
Reputation: 3421
You can try
df=dataframe_elements_int
#split the column
df['new_ID'], df['Name'] = df['ID'].str.split('|').str
#filtering the Name and ID
df['Name']=df['Name'].str.extract(r'((?<=Name:).*$)', expand=True)
df['new_ID']=df['new_ID'].str.extract(r'((?<=ID).*$)',expand=True)
Upvotes: 1
Reputation: 957
You could use the following Regex:
ID\s+(\w+)\s+|\s+Name:\s+(.*)$
If you want to use extract, do the following:
import pandas as pd
df = pd.DataFrame(data=["ID accountsummary | Name: Report Suite Totals",
"ID activity | Name: Activity"], columns=["ID"])
pattern = r"ID\s+(?P<IDnew>\w+)"
df["NewId"] = df["ID"].str.extract(pattern)
pattern = r"Name:\s+(?P<Name>.*)$"
df["Name"] = df["ID"].str.extract(pattern)
df.drop(["ID"], axis=1, inplace=True)
df.rename({"NewId": "ID"})
Upvotes: 1