yulGM
yulGM

Reputation: 1094

splitting custom function output in pandas to multiple columns

I tried looking for similar answers, but solutions didn't work for me.

I have a dataframe with two columns: template(str) and content(str).

I also have a separate function, split_template_name that takes a string and returns a tuple of 5 values, eg:

split_template_name(some_string) will return a tuple of 5 strings ('str1', 'str2', 'str3', 'str4', 'str5')

I'm trying to process the df[template] with this function, so that the dataframe gets 5 more columns with the 5 outputs.

Tried df[template].apply(split_template_name) and it returns full tuple as one column, which is not what I need.

Some stackoverflow answers suggest adding result_type='expand', So I tried df['template'].apply(split_template_name, axis = 1, result_type ='expand')
but that gives errors: split_template_name() got an unexpected keyword argument 'axis' or split_template_name() got an unexpected keyword argument 'result_type'

Basically the goal is to start with df['template', 'content'] and to end with dataframe that has df['template', 'content', 'str1', 'str2', 'str3', 'str4', 'str5']

Upvotes: 1

Views: 733

Answers (3)

Avinash Reddy
Avinash Reddy

Reputation: 1

This worked for me.


df_dict = {"template" :["A B C D E","A B C D E","A B C D E","A B C D E","A 
    B C D E"], "content" : ["text1","text2","text3","text4","text5"]}

df = pd.DataFrame(df_dict)

print(df)

    template    content
0   A B C D E   text1
1   A B C D E   text2
2   A B C D E   text3
3   A B C D E   text4
4   A B C D E   text5


def split_template_name(row):
    return row.split()
df[['A','B','C','D','E']] = df['template'].apply(split_template_name)

print(df)


template    content A   B   C   D   E
0   A B C D E   text1   A   A   A   A   A
1   A B C D E   text2   B   B   B   B   B
2   A B C D E   text3   C   C   C   C   C
3   A B C D E   text4   D   D   D   D   D
4   A B C D E   text5   E   E   E   E   E




Upvotes: 0

yulGM
yulGM

Reputation: 1094

This seems to work:

df[['str1', 'str2', 'str3', 'str4', 'str5']] = pd.DataFrame(
    df['template'].apply(split_template_name).tolist(), index = df.index)

Upvotes: 1

Qdr
Qdr

Reputation: 745

if it's possible to split the column with a regular expression you could use:

df.template.str.extract()

see this example:

import pandas as pd

df = pd.DataFrame({'sentences': ['how_are_you', 'hello_world_good']})

how this dataframe looks like:

          sentences
0       how_are_you
1  hello_world_good

using Series.str.extract()

df['sentences'].str.extract(r'(?P<first>\w+)_(?P<second>\w+)_(?P<third>\w+)')

output:

   first second third
0    how    are   you
1  hello  world  good

Upvotes: 0

Related Questions