bc_plus
bc_plus

Reputation: 3

Python - Pandas error when splitting text using a function

I am trying to create a function that splits text in a column of a dataframe and puts each half of the split into a different new column. I want to split the text right after a specific phrase (defined as "search_text" in the function "create_var") and then trim that text to a specified number of characters (defined as left_trim_number in the function). My function has worked in some cases but does not work in others.

Here is the basic structure of my dataframe, where "lst" is my list of text items and "cols" are the two columns of the original dataframe:

import pandas as pd
cols = ['page', 'text_i']
df1 = pd.DataFrame(lst, columns=cols)

Here is my function:

def create_var(varname, search_text, left_trim_number):
    df1[['a',varname]] = df1['text_i'].str.split(search_text, expand=True)
    df1[varname] = df1[varname].str[: left_trim_number ] 

create_var('var1','I am looking for the text that follows this ',3)

In the cases where it doesnt work, I get this error (which I assume is related to pandas):

"ValueError: Columns must be same length as key"

Is there a better way of doing this?

Upvotes: 0

Views: 70

Answers (1)

Dan
Dan

Reputation: 1587

You could try this:

import pandas as pd

df = pd.DataFrame({"text":["hello world", "a", "again hello world"]})
search_text = "hello "


parts = df['text'].str.partition(search_text)
df['a'] = parts[0] + parts[1]
df['var1'] = parts[2]
df['var1'] = df['var1'].str[:3]

print(df)

Output:

                text             a var1
0        hello world        hello   wor
1                  a             a     
2  again hello world  again hello   wor

Upvotes: 1

Related Questions