Reputation: 1601
In pandas, I'm creating a new column which conditionally comes directly from another column based on a regex match.
It might be easier to just an example:
HAVE:
+-------+------------+------------+-------------+-----------+
| Color | value_Gold | value_Blue | value_Green | value_Red |
+-------+------------+------------+-------------+-----------+
| Gold | 12 | 9 | 11 | 8 |
| Gold | 2 | 7 | 12 | 11 |
| Blue | 1 | 5 | 7 | 1 |
| Red | 6 | 12 | 3 | 10 |
| Green | 11 | 10 | 11 | 2 |
+-------+------------+------------+-------------+-----------+
Code so that you can recreate this dataframe:
df = pd.DataFrame(np.array([['Gold',12,9,11,8],['Gold',2,7,12,11],['Blue',1,5,7,1],['Red',6,12,3,10],['Green',11,10,11,2]]),
columns = ['Color','value_Gold','value_Blue','value_Green','value_Red'])
WANT:
+-------+------------+------------+-------------+-----------+------+
| Color | value_Gold | value_Blue | value_Green | value_Red | WANT |
+-------+------------+------------+-------------+-----------+------+
| Gold | 12 | 9 | 11 | 8 | 12 |
| Gold | 2 | 7 | 12 | 11 | 2 |
| Blue | 1 | 5 | 7 | 1 | 5 |
| Red | 6 | 12 | 3 | 10 | 10 |
| Green | 11 | 10 | 11 | 2 | 11 |
+-------+------------+------------+-------------+-----------+------+
At first, I was thinking I should focus on eliminating the values I don't want in the dataframe first, and then doing some sort of coalesce. But I'm really not sure what the most elegant way to do this is.
Ideas?
Edit: Another thought I had, was to transpose the value_ columns and use it for a lookup, but the truth is that my real dataset has hundreds of thousands of columns and so I didn't think this route was best. In reality, think 200K rows and 20 unique colors.
Upvotes: 2
Views: 361
Reputation: 25259
Use rename
to split column names and call lookup
df['WANT'] = (df.rename(lambda x: x.split('_')[-1], axis=1)
.lookup(df.index, df.Color))
Out[1365]:
Color value_Gold value_Blue value_Green value_Red WANT
0 Gold 12 9 11 8 12
1 Gold 2 7 12 11 2
2 Blue 1 5 7 1 5
3 Red 6 12 3 10 10
4 Green 11 10 11 2 11
Upvotes: 0
Reputation: 150785
You can, for example, use mask
and bfill
:
mask = df['Color'].values == df.columns[1:].str.extract('_(.*)$').values
df['WANT'] = df.iloc[:,1:].where(mask.T).bfill(1).iloc[:,0]
Upvotes: 4