Concatenate column names by using the binary values in the columns

Question

Currently, I have a dataframe as follows:

date          A    B   C
02/19/2020    0    0   0 
02/20/2020    0    0   0
02/21/2020    1    1   1
02/22/2020    0    1   0
02/23/2020    0    1   1
02/24/2020    0    0   1
02/25/2020    1    0   1
02/26/2020    1    0   0

The binary columns contain integers. The "date" column is a DateTime object. I want to create a new categorical column that is based on the binary columns as follows

date          A    B   C   new
02/19/2020    0    0   0   "None"
02/20/2020    0    0   0   "None"
02/21/2020    1    1   1   A+B+C
02/22/2020    0    1   0   B
02/23/2020    0    1   1   B+C
02/24/2020    0    0   1   C
02/25/2020    1    0   1   A+C
02/26/2020    1    0   0   A

How can I achieve this?

jezrael · Accepted Answer

Use DataFrame.dot for matrix multiplication with columns names with omit first column by position in DataFrame.iloc, add separator to columns names without first and last remove separator by indexing str[:-1]:

df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '+').str[:-1]
#set empty string to None
df.loc[df['new'].eq(''), 'new'] = None
print (df)
         date  A  B  C    new
0  02/19/2020  0  0  0   None
1  02/20/2020  0  0  0   None
2  02/21/2020  1  1  1  A+B+C
3  02/22/2020  0  1  0      B
4  02/23/2020  0  1  1    B+C
5  02/24/2020  0  0  1      C
6  02/25/2020  1  0  1    A+C
7  02/26/2020  1  0  0      A

If possible use NaNs instead Nones:

df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '+').str[:-1].replace('', np.nan)

print (df)
         date  A  B  C    new
0  02/19/2020  0  0  0    NaN
1  02/20/2020  0  0  0    NaN
2  02/21/2020  1  1  1  A+B+C
3  02/22/2020  0  1  0      B
4  02/23/2020  0  1  1    B+C
5  02/24/2020  0  0  1      C
6  02/25/2020  1  0  1    A+C
7  02/26/2020  1  0  0      A

Or if possible set first column to DatetimeIndex use:

df1 = df.set_index('date')
df1['new'] = df1.dot(df1.columns + '+').str[:-1]
df1.loc[df1['new'].eq(''), 'new'] = None

Concatenate column names by using the binary values in the columns

Answers (2)

Related Questions