Fred
Fred

Reputation: 7

dataframe new column(list) with values from multiple columns. fastest way

I have a dataframe like this:

Name Food Sport
Tom Paella Tennis,Basketball
Nick Chicken Basketball
Tony Chicken Football
Maria Chicken Basketball

I want to create a new column(a list that contain the values of current columns) like this:

Name Food Sport listcolumn
Tom Paella Tennis,Basketball [Tom, Paella, Tennis, Basketball]
Nick Chicken Basketball [Nick, Chicken, Basketball]
Tony Chicken Football [Tony, Chicken, Football]
Maria Chicken Basketball [Maria, Chicken, Basketball]

This is the way I am currently calculating/adding a new column:


data = {'Name':['Tom', 'nick', 'krish', 'jack'], 
        'Food':['Paella', 'Chicken', 'Chicken', 'Chicken'], 
       'Sport':['Tennis, Basketball','Basketball','Football','Tennis']}

df = pd.DataFrame(data)  

def df_prepare(data):

   
    return (data.fillna('0')
                    .rename(columns={'Sport': 'Courses'})
                    .assign(listcolumn = lambda df:df['Name'].str.split(",") +
                                            df['Food'].str.split(",") +
                                            df['Courses'].str.split(",")))

dataframe_done = df_prepare(df)

Is there an alternative to create the new column in a faster way? This is just an example dataframe. The real dataframe has thousands of rows

Upvotes: 0

Views: 73

Answers (3)

PaulS
PaulS

Reputation: 25383

Another possible solution:

df['listcolumn'] = df.apply(lambda x: ', '.join(x), axis=1).str.split(',')

Output:

    Name     Food               Sport                            listcolumn
0    Tom   Paella  Tennis, Basketball  [Tom,  Paella,  Tennis,  Basketball]
1   nick  Chicken          Basketball         [nick,  Chicken,  Basketball]
2  krish  Chicken            Football          [krish,  Chicken,  Football]
3   jack  Chicken              Tennis             [jack,  Chicken,  Tennis]

Upvotes: 1

jezrael
jezrael

Reputation: 863291

For improve performance convert values to numpy array and use list comprehension with join and split:

df['listcolumn'] = [','.join(x).split(',') for x in df.to_numpy()]
print (df)
   dName     Food              Sport                         listcolumn
0    Tom   Paella  Tennis,Basketball  [Tom, Paella, Tennis, Basketball]
1   Nick  Chicken         Basketball        [Nick, Chicken, Basketball]
2   Tony  Chicken           Football          [Tony, Chicken, Football]
3  Maria  Chicken         Basketball       [Maria, Chicken, Basketball]

Upvotes: 1

Corralien
Corralien

Reputation: 120499

Use a comprehension:

df['listcolumn'] = [','.join(row).split(',') for idx, row in df.iterrows()]
print(df)

# Output
    Name     Food              Sport                         listcolumn
0    Tom   Paella  Tennis,Basketball  [Tom, Paella, Tennis, Basketball]
1   Nick  Chicken         Basketball        [Nick, Chicken, Basketball]
2   Tony  Chicken           Football          [Tony, Chicken, Football]
3  Maria  Chicken         Basketball       [Maria, Chicken, Basketball]

If you have few columns, you can simply:

df['listcolumn'] = (df['Name'] + ',' + df['Food'] + ',' + df['Sport']).str.split(',')
print(df)

# Output
    Name     Food              Sport                         listcolumn
0    Tom   Paella  Tennis,Basketball  [Tom, Paella, Tennis, Basketball]
1   Nick  Chicken         Basketball        [Nick, Chicken, Basketball]
2   Tony  Chicken           Football          [Tony, Chicken, Football]
3  Maria  Chicken         Basketball       [Maria, Chicken, Basketball]

Upvotes: 1

Related Questions