Reputation: 1094
I tried looking for similar answers, but solutions didn't work for me.
I have a dataframe with two columns: template(str) and content(str).
I also have a separate function, split_template_name
that takes a string and returns a tuple of 5 values, eg:
split_template_name(some_string)
will return a tuple of 5 strings ('str1', 'str2', 'str3', 'str4', 'str5')
I'm trying to process the df[template]
with this function, so that the dataframe gets 5 more columns with the 5 outputs.
Tried
df[template].apply(split_template_name)
and it returns full tuple as one column, which is not what I need.
Some stackoverflow answers suggest adding result_type='expand'
, So I tried df['template'].apply(split_template_name, axis = 1, result_type ='expand')
but that gives errors: split_template_name() got an unexpected keyword argument 'axis'
or split_template_name() got an unexpected keyword argument 'result_type'
Basically the goal is to start with df['template', 'content']
and to end with dataframe that has df['template', 'content', 'str1', 'str2', 'str3', 'str4', 'str5']
Upvotes: 1
Views: 733
Reputation: 1
This worked for me.
df_dict = {"template" :["A B C D E","A B C D E","A B C D E","A B C D E","A
B C D E"], "content" : ["text1","text2","text3","text4","text5"]}
df = pd.DataFrame(df_dict)
print(df)
template content
0 A B C D E text1
1 A B C D E text2
2 A B C D E text3
3 A B C D E text4
4 A B C D E text5
def split_template_name(row):
return row.split()
df[['A','B','C','D','E']] = df['template'].apply(split_template_name)
print(df)
template content A B C D E
0 A B C D E text1 A A A A A
1 A B C D E text2 B B B B B
2 A B C D E text3 C C C C C
3 A B C D E text4 D D D D D
4 A B C D E text5 E E E E E
Upvotes: 0
Reputation: 1094
This seems to work:
df[['str1', 'str2', 'str3', 'str4', 'str5']] = pd.DataFrame(
df['template'].apply(split_template_name).tolist(), index = df.index)
Upvotes: 1
Reputation: 745
if it's possible to split the column with a regular expression you could use:
df.template.str.extract()
see this example:
import pandas as pd
df = pd.DataFrame({'sentences': ['how_are_you', 'hello_world_good']})
how this dataframe looks like:
sentences
0 how_are_you
1 hello_world_good
using Series.str.extract()
df['sentences'].str.extract(r'(?P<first>\w+)_(?P<second>\w+)_(?P<third>\w+)')
output:
first second third
0 how are you
1 hello world good
Upvotes: 0