Reputation: 49
I have a messy data frame of marathon winnners inlcuding their country, the year they won the marathon, and their countries gdp/capita for every year since 1970. I would like to create a gdp variable indicating specifically the gdp of the year they won the race.
data example:
YEAR Winner_Name Winner_Country Time Gender Marathon_City Country 1970 1971
1977 Dan Cloeter USA 2:17:52 M Chicago USA 5247.0 5687.0
1978 Mark Stanforth USA 2:19:20 M Chicago USA 5247.0 5687.0
as seen the number 1970 is a variable for gdp of the winners country in that year, but is also a possible result for the variable YEAR. I would like to create a variable gdp that uses the result of the variable YEAR to select the year in of winner's country's the gdp.
What I initially tried: I expect that this is not iterating over every observation.
YEAR = df_gdp['YEAR']
df_gdp['gdp'] = df_gdp[YEAR]
resulting in this error
KeyError: "None of [Int64Index([1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986,\n ...\n 2009, 2010, 2011, 2013, 2014, 2015, 2016, 2017, 2018, 2019],\n dtype='int64', length=258)] are in the [columns]"
example
Take this example data set
letter a b c d
a 1 3 4 2
b 4 3 2 1
c 2 1 4 3
d 3 4 2 1
desired results
letter a b c d correct answer
a 1 3 4 2 1
b 4 3 2 1 3
c 2 1 4 3 4
d 3 4 2 1 1
Upvotes: 0
Views: 410
Reputation: 879
You can try this:
letters=df.letter.values
correct_answer=[]
for index,l in enumerate(letters):
correct_answer.append(df[l][index])
df['correct_answer']=correct_answer
Upvotes: 0