Loop only takes last value

Question

I have a dataFrame with country-specific population for each year and a pandas Series with the world population for each year. This is the Series I am using:

pop_tot = df3.groupby('Year')['population'].sum()
Year     
1990    4.575442e+09
1991    4.659075e+09
1992    4.699921e+09
1993    4.795129e+09
1994    4.862547e+09
1995    4.949902e+09
...     ...
2017    6.837429e+09

and this is the DataFrame I am using

        Country      Year   HDI     population
0       Afghanistan 1990    NaN     1.22491e+07
1       Albania     1990    0.645   3.28654e+06
2       Algeria     1990    0.577   2.59124e+07
3       Andorra     1990    NaN     54509
4       Angola      1990    NaN     1.21714e+07
...     ...         ...     ...     ...
4096    Uzbekistan  2017    0.71    3.23872e+07 
4097    Vanuatu     2017    0.603   276244  
4098    Zambia      2017    0.588   1.70941e+07 
4099    Zimbabwe    2017    0.535   1.65299e+07

I want to calculate the proportion of the world's population that the population of that country represents for each year, so I loop over the Series and the DataFrame as follows:

j = 0
for i in range(len(df3)):
    if df3.iloc[i,1]==pop_tot.index[j]:
        df3['pop_tot']=pop_tot[j] #Sanity check
        df3['weighted']=df3['population']/pop_tot[j]
        *df3.iloc[i,2]
    else:
        j=j+1

However, the DataFrame that I get in return is not the expected one. I end up dividing all the values by the total population of 2017, thus giving me proportions which are not the correct ones for that year (i.e. for this first rows, pop_tot should be 4.575442e+09 as it corresponds to 1990 according to the Series above and not 6.837429e+09 which corresponds to 2017).

     Country   Year HDI   population  pop_tot      weighted
  0  Albania   1990 0.645 3.28654e+06 6.837429e+09 0.000257158
  1  Algeria   1990 0.577 2.59124e+07 6.837429e+09 0.00202753
  2  Argentina 1990 0.704 3.27297e+07 6.837429e+09 0.00256096

I can't see however what's the mistake in the loop. Thanks in advance.

Ben.T · Accepted Answer

You don't need loop, you can use groupby.transform to create the column pop_tot in df3 directly. then for the column weighted just do column operation, such as:

df3['pop_tot'] = df3.groupby('Year')['population'].transform(sum)
df3['weighted'] = df3['population']/df3['pop_tot']

As @roganjosh pointed out, the problem with your method is that you replace the whole columns pop_tot and weighted everytime your condition if is met, so at the last iteration where this condition is met, the year being probably 2017, you define the value of the column pop_tot being the one of 2017 and calculate the weithed with this value as well.

Loop only takes last value

Answers (2)

Related Questions