Aschharwood
Aschharwood

Reputation: 391

Return value in dataframe based on row index, column reference

My goal is to compare each value from the column "year" against the appropriate column year (i.e. 1999, 2000). I then want to return the corresponding value from the corresponding column. For example, for Afghanistan (first row), year 2004, I want to find the column named "2004" and return the value from the row that contains afghanistan.

Here is the table. For reference this table is the result of a sql join between educational attainment in a single defined year and a table for gdp per country for years 1999 - 2010. My ultimate goal is to return the gdp from the year that the educational data is from.

country year    men_ed_yrs  women_ed_yrs    total_ed_yrs    1999    2000    2001    2002    2003    2004    2005    2006    2007    2008    2009    2010
0   Afghanistan 2004    11  5   8   NaN NaN 2461666315  4128818042  4583648922  5285461999  6.275076e+09    7.057598e+09    9.843842e+09    1.019053e+10    1.248694e+10    1.593680e+10
1   Albania 2004    11  11  11  3414760915  3632043908  4060758804  4435078648  5746945913  7314865176  8.158549e+09    8.992642e+09    1.070101e+10    1.288135e+10    1.204421e+10    1.192695e+10
2   Algeria 2005    13  13  13  48640611686 54790060513 54744714110 56760288396 67863829705 85324998959 1.030000e+11    1.170000e+11    1.350000e+11    1.710000e+11    1.370000e+11    1.610000e+11
3   Andorra 2008    11  12  11  1239840270  1401694156  1484004617  1717563533  2373836214  2916913449  3.248135e+09    3.536452e+09    4.010785e+09    4.001349e+09    3.649863e+09    3.346317e+09
4   Anguilla    2008    11  11  11  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

My approach so far is:

for value in df_combined_column_named['year']: #loops through each year in year column
    if value in df_combined_column_named.columns

any thoughts?

Upvotes: 3

Views: 2433

Answers (1)

unutbu
unutbu

Reputation: 879471

Use df.loc:

In [62]: df.loc[df['country']=='Afghanistan', '2004'].item()
Out[62]: 5285461999.0

df.loc[rows, columns] can accept a boolean Series (such as df['country']=='Afghanistan') for rows and a column label (such as '2004') for columns. It will return the values for rows where the boolean Series is True and in the specified column.

In general this can be more than one value, so a Series is returned. However, in this case, there is only one value in the Series. So to obtain just the value, call the item method.


Note it is unclear from the posted string representation of df whether the numeric column labels are strings are integers. If the numeric column labels are integers, then you would need to use

df.loc[df['country']=='Afghanistan', 2004].item()

(with no quotation marks around 2004).


If you are going to make a lot of "queries" of this form, you make wish to set the country column as the index:

df = df.set_index('country')

Then you could access the value in the cell whose row label is 'Afghanistan' and whose column label is '2004' using get_value:

In [65]: df.get_value('Afghanistan', '2004')
Out[65]: 5285461999.0

Upvotes: 4

Related Questions