Reputation: 7
I have a dataframe like this:
Name | Food | Sport |
---|---|---|
Tom | Paella | Tennis,Basketball |
Nick | Chicken | Basketball |
Tony | Chicken | Football |
Maria | Chicken | Basketball |
I want to create a new column(a list that contain the values of current columns) like this:
Name | Food | Sport | listcolumn |
---|---|---|---|
Tom | Paella | Tennis,Basketball | [Tom, Paella, Tennis, Basketball] |
Nick | Chicken | Basketball | [Nick, Chicken, Basketball] |
Tony | Chicken | Football | [Tony, Chicken, Football] |
Maria | Chicken | Basketball | [Maria, Chicken, Basketball] |
This is the way I am currently calculating/adding a new column:
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
'Food':['Paella', 'Chicken', 'Chicken', 'Chicken'],
'Sport':['Tennis, Basketball','Basketball','Football','Tennis']}
df = pd.DataFrame(data)
def df_prepare(data):
return (data.fillna('0')
.rename(columns={'Sport': 'Courses'})
.assign(listcolumn = lambda df:df['Name'].str.split(",") +
df['Food'].str.split(",") +
df['Courses'].str.split(",")))
dataframe_done = df_prepare(df)
Is there an alternative to create the new column in a faster way? This is just an example dataframe. The real dataframe has thousands of rows
Upvotes: 0
Views: 73
Reputation: 25383
Another possible solution:
df['listcolumn'] = df.apply(lambda x: ', '.join(x), axis=1).str.split(',')
Output:
Name Food Sport listcolumn
0 Tom Paella Tennis, Basketball [Tom, Paella, Tennis, Basketball]
1 nick Chicken Basketball [nick, Chicken, Basketball]
2 krish Chicken Football [krish, Chicken, Football]
3 jack Chicken Tennis [jack, Chicken, Tennis]
Upvotes: 1
Reputation: 863291
For improve performance convert values to numpy array and use list comprehension with join
and split
:
df['listcolumn'] = [','.join(x).split(',') for x in df.to_numpy()]
print (df)
dName Food Sport listcolumn
0 Tom Paella Tennis,Basketball [Tom, Paella, Tennis, Basketball]
1 Nick Chicken Basketball [Nick, Chicken, Basketball]
2 Tony Chicken Football [Tony, Chicken, Football]
3 Maria Chicken Basketball [Maria, Chicken, Basketball]
Upvotes: 1
Reputation: 120499
Use a comprehension:
df['listcolumn'] = [','.join(row).split(',') for idx, row in df.iterrows()]
print(df)
# Output
Name Food Sport listcolumn
0 Tom Paella Tennis,Basketball [Tom, Paella, Tennis, Basketball]
1 Nick Chicken Basketball [Nick, Chicken, Basketball]
2 Tony Chicken Football [Tony, Chicken, Football]
3 Maria Chicken Basketball [Maria, Chicken, Basketball]
If you have few columns, you can simply:
df['listcolumn'] = (df['Name'] + ',' + df['Food'] + ',' + df['Sport']).str.split(',')
print(df)
# Output
Name Food Sport listcolumn
0 Tom Paella Tennis,Basketball [Tom, Paella, Tennis, Basketball]
1 Nick Chicken Basketball [Nick, Chicken, Basketball]
2 Tony Chicken Football [Tony, Chicken, Football]
3 Maria Chicken Basketball [Maria, Chicken, Basketball]
Upvotes: 1