bhylynch98
bhylynch98

Reputation: 49

reference a pandas column name by using an an existing variable

I have a messy data frame of marathon winnners inlcuding their country, the year they won the marathon, and their countries gdp/capita for every year since 1970. I would like to create a gdp variable indicating specifically the gdp of the year they won the race.

data example:

YEAR    Winner_Name    Winner_Country   Time    Gender  Marathon_City   Country 1970    1971     
1977    Dan Cloeter    USA              2:17:52 M       Chicago         USA     5247.0  5687.0  
1978    Mark Stanforth USA              2:19:20 M       Chicago         USA     5247.0  5687.0

as seen the number 1970 is a variable for gdp of the winners country in that year, but is also a possible result for the variable YEAR. I would like to create a variable gdp that uses the result of the variable YEAR to select the year in of winner's country's the gdp.

What I initially tried: I expect that this is not iterating over every observation.

YEAR = df_gdp['YEAR']
df_gdp['gdp'] = df_gdp[YEAR]

resulting in this error

KeyError: "None of [Int64Index([1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986,\n ...\n 2009, 2010, 2011, 2013, 2014, 2015, 2016, 2017, 2018, 2019],\n dtype='int64', length=258)] are in the [columns]"

example

Take this example data set

letter a b c d
a      1 3 4 2  
b      4 3 2 1 
c      2 1 4 3
d      3 4 2 1

desired results

letter a b c d  correct answer
a      1 3 4 2  1  
b      4 3 2 1  3 
c      2 1 4 3  4
d      3 4 2 1  1

Upvotes: 0

Views: 410

Answers (1)

N.Moudgil
N.Moudgil

Reputation: 879

You can try this:

letters=df.letter.values
correct_answer=[]
for index,l in enumerate(letters):
   correct_answer.append(df[l][index])

df['correct_answer']=correct_answer

Upvotes: 0

Related Questions