Reputation: 337
I am trying to create a new column with values that are extracted from other columns based on matching column name. For example, I have the dataframe below:
x = [1, 0.212, 0.111]
y = [0, 0.233, 0.122]
z = [1, 0.222, 0.123]
test_df = pd.DataFrame([x,y,z], columns=["Cluster","0","1"])
print(test_df)
Cluster 0 1
0 1 0.212 0.111
1 0 0.233 0.122
2 1 0.222 0.123
I want to have a new column, let's say "z" which has values from either column "0" or "1" based on the value of column "Cluster". So my expected output should be
Cluster 0 1 z
0 1 0.212 0.111 0.111
1 0 0.233 0.122 0.233
2 1 0.222 0.123 0.123
As you can see, the first row of column "z" has the same value as the first row of column "1" because the value of the first row of column "Cluster" is 1. And so on for the rest of the dataframe.
I have tried pd.lookup, for example
test_df["z"] = test_df.lookup(test_df.index,test_df["Cluster"])
But I get an error.
Please help. Thank you so much in advanced!
Regards,
Tommy
Upvotes: 4
Views: 2357
Reputation: 71707
You can still use DataFrame.lookup
, you are getting error because the columns in your dataframe are labels with string
type while the values in the Cluster
column are of int
type, so using lookup in such case will throw a KeyError
. So in order to resolve this problem, change the dtype
of Cluster
column to str
while using lookup:
df['z'] = df.lookup(df.index, df['Cluster'].astype(str))
# print(df)
Cluster 0 1 z
0 1 0.212 0.111 0.111
1 0 0.233 0.122 0.233
2 1 0.222 0.123 0.123
Upvotes: 2
Reputation: 23099
Avoiding apply
using stack
s = test_df.set_index('Cluster',append=True).stack().reset_index(level=[1,2])
test_df['z'] = s[s['Cluster'].eq(s['level_2'].astype(int))][0]
print(test_df)
Cluster 0 1 z
0 1 0.212 0.111 0.111
1 0 0.233 0.122 0.233
2 1 0.222 0.123 0.123
s
creates a vertical data-frame with your Cluster and level_2 (your_columns) as the index.
print(s)
Cluster level_2 0
0 1 0 0.212
0 1 1 0.111
1 0 0 0.233
1 0 1 0.122
2 1 0 0.222
2 1 1 0.123
as we've kept your original index we can filter out the non matches and re-assign back to your original dataframe
print(s[s['Cluster'].eq(s['level_2'].astype(int))][0])
0 0.111
1 0.233
2 0.123
Name: 0, dtype: float64
Upvotes: 1
Reputation: 82805
This is one approach using apply
.
Ex:
x = [1, 0.212, 0.111]
y = [0, 0.233, 0.122]
z = [1, 0.222, 0.123]
df = pd.DataFrame([x,y,z], columns=["Cluster","0","1"])
df['Cluster'] = df['Cluster'].astype(int).astype(str)
df['Z'] = df.apply(lambda x: x[x['Cluster']], axis = 1)
print(df)
Output:
Cluster 0 1 Z
0 1 0.212 0.111 0.111
1 0 0.233 0.122 0.233
2 1 0.222 0.123 0.123
Upvotes: 1