Yippee
Yippee

Reputation: 337

Extract row value based on matching column name and column value

I am trying to create a new column with values that are extracted from other columns based on matching column name. For example, I have the dataframe below:

x = [1, 0.212, 0.111]
y = [0, 0.233, 0.122]
z = [1, 0.222, 0.123]
test_df = pd.DataFrame([x,y,z], columns=["Cluster","0","1"])
print(test_df)

   Cluster      0      1
0        1  0.212  0.111
1        0  0.233  0.122
2        1  0.222  0.123

I want to have a new column, let's say "z" which has values from either column "0" or "1" based on the value of column "Cluster". So my expected output should be

   Cluster      0      1      z
0        1  0.212  0.111  0.111
1        0  0.233  0.122  0.233
2        1  0.222  0.123  0.123

As you can see, the first row of column "z" has the same value as the first row of column "1" because the value of the first row of column "Cluster" is 1. And so on for the rest of the dataframe.

I have tried pd.lookup, for example

test_df["z"] = test_df.lookup(test_df.index,test_df["Cluster"])

But I get an error.

Please help. Thank you so much in advanced!

Regards,

Tommy

Upvotes: 4

Views: 2357

Answers (3)

Shubham Sharma
Shubham Sharma

Reputation: 71707

You can still use DataFrame.lookup, you are getting error because the columns in your dataframe are labels with string type while the values in the Cluster column are of int type, so using lookup in such case will throw a KeyError. So in order to resolve this problem, change the dtype of Cluster column to str while using lookup:

df['z'] = df.lookup(df.index, df['Cluster'].astype(str))

# print(df)
   Cluster      0      1      z
0        1  0.212  0.111  0.111
1        0  0.233  0.122  0.233
2        1  0.222  0.123  0.123

Upvotes: 2

Umar.H
Umar.H

Reputation: 23099

Avoiding apply using stack

s = test_df.set_index('Cluster',append=True).stack().reset_index(level=[1,2])

test_df['z'] = s[s['Cluster'].eq(s['level_2'].astype(int))][0]

print(test_df)


   Cluster      0      1      z
0        1  0.212  0.111  0.111
1        0  0.233  0.122  0.233
2        1  0.222  0.123  0.123

details

s creates a vertical data-frame with your Cluster and level_2 (your_columns) as the index.

print(s)
   Cluster level_2      0
0        1       0  0.212
0        1       1  0.111
1        0       0  0.233
1        0       1  0.122
2        1       0  0.222
2        1       1  0.123

as we've kept your original index we can filter out the non matches and re-assign back to your original dataframe

print(s[s['Cluster'].eq(s['level_2'].astype(int))][0])

0    0.111
1    0.233
2    0.123
Name: 0, dtype: float64

Upvotes: 1

Rakesh
Rakesh

Reputation: 82805

This is one approach using apply.

Ex:

x = [1, 0.212, 0.111]
y = [0, 0.233, 0.122]
z = [1, 0.222, 0.123]
df = pd.DataFrame([x,y,z], columns=["Cluster","0","1"])

df['Cluster'] = df['Cluster'].astype(int).astype(str)
df['Z'] = df.apply(lambda x:  x[x['Cluster']], axis = 1)

print(df)

Output:

  Cluster      0      1      Z
0       1  0.212  0.111  0.111
1       0  0.233  0.122  0.233
2       1  0.222  0.123  0.123

Upvotes: 1

Related Questions