k..
k..

Reputation: 401

Concatenate 2 Pandas list-filled columns into 1 big list?

I have a Pandas DataFrame that looks like this:

     NAME      total           total_temp
ID                                      
1     CVS     [abc1]       [cba, xyzzy01]
2  Costco     [bcd2, 22]   [dcb, xyzzy02]
3   Apple     [cde3]       [edc, xyzzy03]

I want to add create a new column total_temp_2 so that the data looks like this:

     NAME      total       total_temp                   total_temp_2
ID                                                  
1     CVS     [abc1]       [cba, xyzzy01]       [abc1, cba, xyzzy01]
2  Costco     [bcd2, 22]   [dcb, xyzzy02]   [bcd2, 22, dcb, xyzzy02]
3   Apple     [cde3]       [edc, xyzzy03]       [cde3, edc, xyzzy03]

I feel like I could guess my way through really inefficient ways to concatenate the lists, but I suspect I'm missing something I don't know about Pandas.

How can i achieve this operation using pandas?

Upvotes: 2

Views: 282

Answers (2)

cs95
cs95

Reputation: 402473

When dealing with mixed types, I usually recommend using something like a list comprehension which has minimal memory and performance overhead.

df['total_temp_2'] = [x + y for x, y in zip(df['total'], df['total_temp'])]
df

      NAME       total      total_temp              total_temp_2
ID                                                              
1      CVS      [abc1]  [cba, xyzzy01]      [abc1, cba, xyzzy01]
2   Costco  [bcd2, 22]  [dcb, xyzzy02]  [bcd2, 22, dcb, xyzzy02]
3    Apple      [cde3]  [edc, xyzzy03]      [cde3, edc, xyzzy03]

If these are columns of strings, you can use ast.literal_eval to parse them:

import ast

c = df.select_dtypes(include=[object]).columns
df[c] = df[c].applymap(ast.literal_eval)

If the solution above throws ValueError: malformed node or string:, try using the yaml package instead.

import yaml
df = df.applymap(yaml.load)

Funnily enough, simple addition works for me on 0.24.

df['total'] + df['total_temp']

ID
1        [abc1, cba, xyzzy01]
2    [bcd2, 22, dcb, xyzzy02]
3        [cde3, edc, xyzzy03]
dtype: object

These also work,

df['total'].add(df['total_temp'])

ID
1        [abc1, cba, xyzzy01]
2    [bcd2, 22, dcb, xyzzy02]
3        [cde3, edc, xyzzy03]
dtype: object

df['total_temp'].radd(df['total'])

ID
1        [abc1, cba, xyzzy01]
2    [bcd2, 22, dcb, xyzzy02]
3        [cde3, edc, xyzzy03]
dtype: object

These are great in terms of simplicity, but inherently loopy since mixed type operations are harder to vectorize.

Upvotes: 4

Taipan
Taipan

Reputation: 116

In a situation like this (wanting to apply a function to a dataframe I usually go to .apply(). So I would run this:

df['total_temp_2'] = df.apply(lambda x: x['total'] + x['total_temp'], axis=1)

Using the built-in pandas functionality is optimal for this type of transformation.

Upvotes: 1

Related Questions