Reputation: 145
My dataframe has four columns with colors. I want to combine them into one column called "Colors" and use commas to separate the values.
For example, I'm trying to combine into a Colors column like this :
ID Black Red Blue Green Colors
120 NaN red NaN green red, green
121 black Nan blue NaN black, blue
My code is:
df['Colors'] = df[['Black, 'Red', 'Blue', 'Green']].apply(lambda x: ', '.join(x), axis=1)
But the output for ID 120 is:
, red, , green
And the output for ID 121 is:
black, , blue,
FOUND MY PROBLEM!
Earlier in my code, I replaced "None"
with " "
instead of NaN
. Upon making the change, plus incorporating feedback to insert [x.notnull()]
, it works!
df['Black'].replace('None', np.nan, inplace=True)
df['Colors'] = df[['Black, 'Red', 'Blue', 'Green']].apply(lambda x: ', '.join(x[x.notnull()]), axis=1)
Upvotes: 14
Views: 23437
Reputation: 399
.apply(...., axis=1)
should be avoided. It can be slow because the function is applied on each row separately. The effect is especially noticeable on large DataFrames.
Using transform with built-in functions is preferred or otherwise create a custom function. It should return a DataFrame.
For your case:
def join_columns(X: pd.DataFrame):
separator = ','
col_name = 'Colors'
for i, c in enumerate(X.columns):
if i == 0:
X_out = X[c].copy()
else:
X_out += separator + X[c]
return X_out.to_frame(name=col_name)
(You may need to convert columns to strings)
Upvotes: 0
Reputation: 323326
Using dot
s=df.iloc[:,1:]
s.notnull()
Black Red Blue Green
0 False True False True
1 True True True False
s.notnull().dot(s.columns+',').str[:-1]
0 Red,Green
1 Black,Red,Blue
dtype: object
df['color']=s.notnull().dot(s.columns+',').str[:-1]
Upvotes: 3
Reputation: 38415
You just need to handle NaNs
df['Colors'] = df[['Black', 'Red', 'Blue', 'Green']].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1)
ID Black Red Blue Green Colors
0 120 NaN red NaN green red, green
1 121 black NaN blue NaN black, blue
Upvotes: 17