Reputation: 1142
I'm trying to conditionally concat two columns in a Pandas DataFrame.
I found one related answer, which I adapted below - but it seems like there should be a more concise way to do this. In R with dplyr or data.table, this is a relatively simple line of code.
import pandas as pd
import numpy as np
data = {"Product": ["Shorts", "T-Shirt", "Jacket", "Cap"],
"Color": ["Red", "Blue", "White", "Green"],
"Size": ["S", "M", None, "S"]}
df = pd.DataFrame(data)
df
# if size = 'S' then concatenate Product and Color, else just Put in the value from Color column
for index, row in df.iterrows():
if row['Size'] == 'S':
df.loc[index, 'Output'] = str(row['Product']) + " (" + str(row['Color']) + ')'
else:
df.loc[index, 'Output'] = str(row['Color'])
df
Upvotes: 0
Views: 815
Reputation: 3825
It's also just a one-liner using datar
:
>>> from datar.all import f, tibble, mutate, if_else, paste0
>>> data = {"Product": ["Shorts", "T-Shirt", "Jacket", "Cap"],
... "Color": ["Red", "Blue", "White", "Green"],
... "Size": ["S", "M", None, "S"]}
>>> df = tibble(**data)
>>> df >> mutate(Output=if_else(f.Size == "S", paste0(f.Product, " (", f.Color, ")"), f.Color))
Product Color Size Output
<object> <object> <object> <object>
0 Shorts Red S Shorts (Red)
1 T-Shirt Blue M Blue
2 Jacket White None White
3 Cap Green S Cap (Green)
I am the author of datar, which is a python package backed by pandas to implement dplyr
-like syntax in R.
Upvotes: 1
Reputation: 35626
np.where
is used to generate column values conditionally. First parameter is the condition, then the True
values then the False
values. In this case this can be something like:
df['Output'] = np.where(
df['Size'].eq('S'), # Condition
df['Product'].astype(str) + df['Color'].map(' ({})'.format), # Where True
df['Color'] # Where False
)
df
:
Product Color Size Output
0 Shorts Red S Shorts (Red)
1 T-Shirt Blue M Blue
2 Jacket White None White
3 Cap Green S Cap (Green)
Note: map
was used with a format string because this can be faster as it produces less copies than multiple string concatenation, but it can also be done like:
df['Output'] = np.where(
df['Size'] == 'S',
df['Product'].astype(str) + ' (' + df['Color'].astype(str) + ')',
df['Color']
)
Upvotes: 0