Reputation: 391
My goal is to compare each value from the column "year" against the appropriate column year (i.e. 1999, 2000). I then want to return the corresponding value from the corresponding column. For example, for Afghanistan (first row), year 2004, I want to find the column named "2004" and return the value from the row that contains afghanistan.
Here is the table. For reference this table is the result of a sql join between educational attainment in a single defined year and a table for gdp per country for years 1999 - 2010. My ultimate goal is to return the gdp from the year that the educational data is from.
country year men_ed_yrs women_ed_yrs total_ed_yrs 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
0 Afghanistan 2004 11 5 8 NaN NaN 2461666315 4128818042 4583648922 5285461999 6.275076e+09 7.057598e+09 9.843842e+09 1.019053e+10 1.248694e+10 1.593680e+10
1 Albania 2004 11 11 11 3414760915 3632043908 4060758804 4435078648 5746945913 7314865176 8.158549e+09 8.992642e+09 1.070101e+10 1.288135e+10 1.204421e+10 1.192695e+10
2 Algeria 2005 13 13 13 48640611686 54790060513 54744714110 56760288396 67863829705 85324998959 1.030000e+11 1.170000e+11 1.350000e+11 1.710000e+11 1.370000e+11 1.610000e+11
3 Andorra 2008 11 12 11 1239840270 1401694156 1484004617 1717563533 2373836214 2916913449 3.248135e+09 3.536452e+09 4.010785e+09 4.001349e+09 3.649863e+09 3.346317e+09
4 Anguilla 2008 11 11 11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
My approach so far is:
for value in df_combined_column_named['year']: #loops through each year in year column
if value in df_combined_column_named.columns
any thoughts?
Upvotes: 3
Views: 2433
Reputation: 879471
Use df.loc
:
In [62]: df.loc[df['country']=='Afghanistan', '2004'].item()
Out[62]: 5285461999.0
df.loc[rows, columns]
can accept a boolean Series (such as df['country']=='Afghanistan'
) for rows
and a column label (such as '2004'
) for columns
. It will return the values for rows where the boolean Series is True and in the specified column.
In general this can be more than one value, so a Series is returned. However, in this case, there is only one value in the Series. So to obtain just the value, call the item
method.
Note it is unclear from the posted string representation of df
whether the numeric column labels are strings are integers. If the numeric column labels are integers, then you would need to use
df.loc[df['country']=='Afghanistan', 2004].item()
(with no quotation marks around 2004
).
If you are going to make a lot of "queries" of this form, you make wish to set the country
column as the index:
df = df.set_index('country')
Then you could access the value in the cell whose row label is 'Afghanistan'
and whose column label is '2004'
using get_value
:
In [65]: df.get_value('Afghanistan', '2004')
Out[65]: 5285461999.0
Upvotes: 4