KRDavis
KRDavis

Reputation: 145

pandas merge columns to create new column with comma separated values

My dataframe has four columns with colors. I want to combine them into one column called "Colors" and use commas to separate the values.

For example, I'm trying to combine into a Colors column like this :

ID  Black Red  Blue  Green  Colors   
120 NaN   red  NaN   green  red, green  
121 black Nan  blue  NaN    black, blue

My code is:

df['Colors'] = df[['Black, 'Red', 'Blue', 'Green']].apply(lambda x: ', '.join(x), axis=1)

But the output for ID 120 is:

, red, , green

And the output for ID 121 is:

black, , blue,

FOUND MY PROBLEM! Earlier in my code, I replaced "None" with " " instead of NaN. Upon making the change, plus incorporating feedback to insert [x.notnull()], it works!

df['Black'].replace('None', np.nan, inplace=True)
df['Colors'] = df[['Black, 'Red', 'Blue', 'Green']].apply(lambda x: ', '.join(x[x.notnull()]), axis=1)

Upvotes: 14

Views: 23437

Answers (3)

3UqU57GnaX
3UqU57GnaX

Reputation: 399

.apply(...., axis=1) should be avoided. It can be slow because the function is applied on each row separately. The effect is especially noticeable on large DataFrames.

Using transform with built-in functions is preferred or otherwise create a custom function. It should return a DataFrame.

For your case:

def join_columns(X: pd.DataFrame):
    separator = ','
    col_name = 'Colors'
    for i, c in enumerate(X.columns):
        if i == 0:
            X_out = X[c].copy()
        else:
            X_out += separator + X[c]
    return X_out.to_frame(name=col_name)

(You may need to convert columns to strings)

Upvotes: 0

BENY
BENY

Reputation: 323326

Using dot

s=df.iloc[:,1:]
s.notnull()
   Black   Red   Blue  Green
0  False  True  False   True
1   True  True   True  False
s.notnull().dot(s.columns+',').str[:-1]
0         Red,Green
1    Black,Red,Blue
dtype: object

df['color']=s.notnull().dot(s.columns+',').str[:-1]

Upvotes: 3

Vaishali
Vaishali

Reputation: 38415

You just need to handle NaNs

df['Colors'] = df[['Black', 'Red', 'Blue', 'Green']].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1)

    ID      Black   Red Blue    Green   Colors
0   120     NaN     red NaN     green   red, green
1   121     black   NaN blue    NaN     black, blue

Upvotes: 17

Related Questions