James Holland
James Holland

Reputation: 1142

Best/Concise Way to Conditionally Concat two Columns in Pandas DataFrame

I'm trying to conditionally concat two columns in a Pandas DataFrame.

I found one related answer, which I adapted below - but it seems like there should be a more concise way to do this. In R with dplyr or data.table, this is a relatively simple line of code.

import pandas as pd
import numpy as np

data = {"Product": ["Shorts", "T-Shirt", "Jacket", "Cap"],
        "Color": ["Red", "Blue", "White", "Green"],
        "Size": ["S", "M", None, "S"]}

df = pd.DataFrame(data)
df

# if size = 'S' then concatenate Product and Color, else just Put in the value from Color column


for index, row in df.iterrows():
    if row['Size'] == 'S':
        df.loc[index, 'Output'] = str(row['Product']) + " (" + str(row['Color']) + ')'
    else:
        df.loc[index, 'Output'] = str(row['Color'])
        
        
df

Upvotes: 0

Views: 815

Answers (2)

Panwen Wang
Panwen Wang

Reputation: 3825

It's also just a one-liner using datar:

>>> from datar.all import f, tibble, mutate, if_else, paste0
>>> data = {"Product": ["Shorts", "T-Shirt", "Jacket", "Cap"],
...         "Color": ["Red", "Blue", "White", "Green"],
...         "Size": ["S", "M", None, "S"]}
>>> df = tibble(**data)
>>> df >> mutate(Output=if_else(f.Size == "S", paste0(f.Product, " (", f.Color, ")"), f.Color))
   Product    Color     Size        Output
  <object> <object> <object>      <object>
0   Shorts      Red        S  Shorts (Red)
1  T-Shirt     Blue        M          Blue
2   Jacket    White     None         White
3      Cap    Green        S   Cap (Green)

I am the author of datar, which is a python package backed by pandas to implement dplyr-like syntax in R.

Upvotes: 1

Henry Ecker
Henry Ecker

Reputation: 35626

np.where is used to generate column values conditionally. First parameter is the condition, then the True values then the False values. In this case this can be something like:

df['Output'] = np.where(
    df['Size'].eq('S'),  # Condition
    df['Product'].astype(str) + df['Color'].map(' ({})'.format),  # Where True
    df['Color']  # Where False
)

df:

   Product  Color  Size        Output
0   Shorts    Red     S  Shorts (Red)
1  T-Shirt   Blue     M          Blue
2   Jacket  White  None         White
3      Cap  Green     S   Cap (Green)

Note: map was used with a format string because this can be faster as it produces less copies than multiple string concatenation, but it can also be done like:

df['Output'] = np.where(
    df['Size'] == 'S',
    df['Product'].astype(str) + ' (' + df['Color'].astype(str) + ')',
    df['Color']
)

Upvotes: 0

Related Questions