Reputation: 3
I am trying to create a function that splits text in a column of a dataframe and puts each half of the split into a different new column. I want to split the text right after a specific phrase (defined as "search_text" in the function "create_var") and then trim that text to a specified number of characters (defined as left_trim_number in the function). My function has worked in some cases but does not work in others.
Here is the basic structure of my dataframe, where "lst" is my list of text items and "cols" are the two columns of the original dataframe:
import pandas as pd
cols = ['page', 'text_i']
df1 = pd.DataFrame(lst, columns=cols)
Here is my function:
def create_var(varname, search_text, left_trim_number):
df1[['a',varname]] = df1['text_i'].str.split(search_text, expand=True)
df1[varname] = df1[varname].str[: left_trim_number ]
create_var('var1','I am looking for the text that follows this ',3)
In the cases where it doesnt work, I get this error (which I assume is related to pandas):
"ValueError: Columns must be same length as key"
Is there a better way of doing this?
Upvotes: 0
Views: 70
Reputation: 1587
You could try this:
import pandas as pd
df = pd.DataFrame({"text":["hello world", "a", "again hello world"]})
search_text = "hello "
parts = df['text'].str.partition(search_text)
df['a'] = parts[0] + parts[1]
df['var1'] = parts[2]
df['var1'] = df['var1'].str[:3]
print(df)
Output:
text a var1
0 hello world hello wor
1 a a
2 again hello world again hello wor
Upvotes: 1