Reputation: 401
I have a Pandas DataFrame that looks like this:
NAME total total_temp
ID
1 CVS [abc1] [cba, xyzzy01]
2 Costco [bcd2, 22] [dcb, xyzzy02]
3 Apple [cde3] [edc, xyzzy03]
I want to add create a new column total_temp_2 so that the data looks like this:
NAME total total_temp total_temp_2
ID
1 CVS [abc1] [cba, xyzzy01] [abc1, cba, xyzzy01]
2 Costco [bcd2, 22] [dcb, xyzzy02] [bcd2, 22, dcb, xyzzy02]
3 Apple [cde3] [edc, xyzzy03] [cde3, edc, xyzzy03]
I feel like I could guess my way through really inefficient ways to concatenate the lists, but I suspect I'm missing something I don't know about Pandas.
How can i achieve this operation using pandas?
Upvotes: 2
Views: 282
Reputation: 402473
When dealing with mixed types, I usually recommend using something like a list comprehension which has minimal memory and performance overhead.
df['total_temp_2'] = [x + y for x, y in zip(df['total'], df['total_temp'])]
df
NAME total total_temp total_temp_2
ID
1 CVS [abc1] [cba, xyzzy01] [abc1, cba, xyzzy01]
2 Costco [bcd2, 22] [dcb, xyzzy02] [bcd2, 22, dcb, xyzzy02]
3 Apple [cde3] [edc, xyzzy03] [cde3, edc, xyzzy03]
If these are columns of strings, you can use ast.literal_eval
to parse them:
import ast
c = df.select_dtypes(include=[object]).columns
df[c] = df[c].applymap(ast.literal_eval)
If the solution above throws ValueError: malformed node or string:
, try using the yaml
package instead.
import yaml
df = df.applymap(yaml.load)
Funnily enough, simple addition works for me on 0.24.
df['total'] + df['total_temp']
ID
1 [abc1, cba, xyzzy01]
2 [bcd2, 22, dcb, xyzzy02]
3 [cde3, edc, xyzzy03]
dtype: object
These also work,
df['total'].add(df['total_temp'])
ID
1 [abc1, cba, xyzzy01]
2 [bcd2, 22, dcb, xyzzy02]
3 [cde3, edc, xyzzy03]
dtype: object
df['total_temp'].radd(df['total'])
ID
1 [abc1, cba, xyzzy01]
2 [bcd2, 22, dcb, xyzzy02]
3 [cde3, edc, xyzzy03]
dtype: object
These are great in terms of simplicity, but inherently loopy since mixed type operations are harder to vectorize.
Upvotes: 4
Reputation: 116
In a situation like this (wanting to apply a function to a dataframe I usually go to .apply()
. So I would run this:
df['total_temp_2'] = df.apply(lambda x: x['total'] + x['total_temp'], axis=1)
Using the built-in pandas functionality is optimal for this type of transformation.
Upvotes: 1