Reputation: 1190
I have a dataframe with cars and its colors:
colors_df = pd.DataFrame({'id' : [1,2,3], 'Color_Red':[0,1,0],'Color_Blue':[1,0,0],'Color_Green':[1,1,1]})
| id | Color_Red | Color_Blue | Color_Green |
| 1 |. 0. | 1 | 1 |
| 2 |. 1. | 0 | 1 |
| 3 |. 0. | 0 | 1 |
I want to create an extra column for every color named Color_XXX_Only and be set to value 1 if an id has only the one 1 in it's colors, and additionally a MultiColor column if it has more than one color set to 1. Expected output:
| id | Color_Red | Color_Blue | Color_Green | Only_Red | Only_Blue | Only_Green | MultiColor|
| 1 |. 0. | 1 | 1 |. 0. | 0 | 0 | 1. |
| 2 |. 1. | 0 | 1 |. 0. | 0 | 0 | 1. |
| 3 |. 0. | 0 | 1 |. 0. | 0 | 1 | 0 |
1 is multi color because it is blue and green.
2 is multi color because its red and green.
3 is Onlye_blue and multi color because its only green.
Upvotes: 0
Views: 194
Reputation: 63
This approach uses masks and loc
#build masks
m_red=(colors_df.Color_Red==1)
m_blue=(colors_df.Color_Blue==1)
m_green=(colors_df.Color_Green==1)
m_multi=(colors_df.Color_Red+colors_df.Color_Blue+colors_df.Color_Green>1)
#build new columns using loc
colors_df.loc[m_red,'Only_Red']=1
colors_df.loc[m_blue,'Only_Blue']=1
colors_df.loc[m_green,'Only_Green']=1
colors_df.loc[m_multi,'Multi']=1
#use fillna to replace NaN with 0
colors_df.fillna(0)
Upvotes: 0
Reputation: 35626
One approach (assuming each id
is contained in a single row and does not span multiple rows) would be to calculate the number of 1's per row:
filter
to get only the Color_
columnssum
across the rows to get the total number of 1
s then compare to gt
1 (intuition is that rows with more than a single 1 value are Multicolor).colors_cols = colors_df.filter(like='Color_')
multi_color = colors_cols.sum(axis=1).gt(1)
multi_color
:
0 True
1 True
2 False
dtype: bool
Mask out the Multicolor rows and update the column names:
mask
rows that are MultiColor and set the values to 0
.str.replace
replacing the word Color_
with Only_
colors_cols = colors_cols.mask(multi_color, 0)
colors_cols.columns = (
colors_cols.columns.str.replace(r'Color_(.*)', r'Only_\1', regex=True)
)
Only_Red Only_Blue Only_Green
0 0 0 0
1 0 0 0
2 0 0 1
Then join
back and add Multicolor column:
join
the above dataframe back to colors_df
to add the 3 new columns.multi_color
which indicates Multicolor rows from bool
to int
with astype
colors_df = colors_df.join(colors_cols)
colors_df['MultiColor'] = multi_color.astype(int)
colors_df
:
id Color_Red Color_Blue Color_Green Only_Red Only_Blue Only_Green MultiColor
0 1 0 1 1 0 0 0 1
1 2 1 0 1 0 0 0 1
2 3 0 0 1 0 0 1 0
Complete Working Example:
import pandas as pd
colors_df = pd.DataFrame({
'id': [1, 2, 3],
'Color_Red': [0, 1, 0],
'Color_Blue': [1, 0, 0],
'Color_Green': [1, 1, 1]
})
# Create Necessary Boolean Information
colors_cols = colors_df.filter(like='Color_')
multi_color = colors_cols.sum(axis=1).gt(1)
# Build Only_ columns
colors_cols = colors_cols.mask(multi_color, 0)
colors_cols.columns = (
colors_cols.columns.str.replace(r'Color_(.*)', r'Only_\1', regex=True)
)
# Add Columns to `colors_df`
colors_df = colors_df.join(colors_cols)
colors_df['MultiColor'] = multi_color.astype(int)
# Display
print(colors_df.to_string())
Upvotes: 3