Reputation: 13
Currently, I have a dataframe as follows:
date A B C
02/19/2020 0 0 0
02/20/2020 0 0 0
02/21/2020 1 1 1
02/22/2020 0 1 0
02/23/2020 0 1 1
02/24/2020 0 0 1
02/25/2020 1 0 1
02/26/2020 1 0 0
The binary columns contain integers. The "date" column is a DateTime object. I want to create a new categorical column that is based on the binary columns as follows
date A B C new
02/19/2020 0 0 0 "None"
02/20/2020 0 0 0 "None"
02/21/2020 1 1 1 A+B+C
02/22/2020 0 1 0 B
02/23/2020 0 1 1 B+C
02/24/2020 0 0 1 C
02/25/2020 1 0 1 A+C
02/26/2020 1 0 0 A
How can I achieve this?
Upvotes: 1
Views: 132
Reputation: 19
You can iterate over the Dataframe to calculate the new columns values and then add it.
This is a basic example
new_column = []
for i, row in df.iterrows():
row_val = None
if row["A"]:
if row_val:
row_val += "+A"
else:
row_val = "A"
if row["B"]:
if row_val:
row_val += "+B"
else:
row_val = "B"
if row["C"]:
if row_val:
row_val += "+C"
else:
row_val = "C"
if row_val is None:
row_val = "None"
new_column.append(row_val)
df["new_column_name"] = new_column
Upvotes: 0
Reputation: 863291
Use DataFrame.dot
for matrix multiplication with columns names with omit first column by position in DataFrame.iloc
, add separator to columns names without first and last remove separator by indexing str[:-1]
:
df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '+').str[:-1]
#set empty string to None
df.loc[df['new'].eq(''), 'new'] = None
print (df)
date A B C new
0 02/19/2020 0 0 0 None
1 02/20/2020 0 0 0 None
2 02/21/2020 1 1 1 A+B+C
3 02/22/2020 0 1 0 B
4 02/23/2020 0 1 1 B+C
5 02/24/2020 0 0 1 C
6 02/25/2020 1 0 1 A+C
7 02/26/2020 1 0 0 A
If possible use NaN
s instead None
s:
df['new'] = df.iloc[:, 1:].dot(df.columns[1:] + '+').str[:-1].replace('', np.nan)
print (df)
date A B C new
0 02/19/2020 0 0 0 NaN
1 02/20/2020 0 0 0 NaN
2 02/21/2020 1 1 1 A+B+C
3 02/22/2020 0 1 0 B
4 02/23/2020 0 1 1 B+C
5 02/24/2020 0 0 1 C
6 02/25/2020 1 0 1 A+C
7 02/26/2020 1 0 0 A
Or if possible set first column to DatetimeIndex
use:
df1 = df.set_index('date')
df1['new'] = df1.dot(df1.columns + '+').str[:-1]
df1.loc[df1['new'].eq(''), 'new'] = None
Upvotes: 3