R - get sum from one column based on categories in another column

Question

I am new to R and trying to learn on my own. I have data in csv format with 1,048,575 rows and 73 columns. I am looking at three columns - year, country, aid_amount. I want to get the sum of aid_amount by country for i) all years, and ii) for years 1991-2010. I tried the following to get for all years BUT the result I get is different from when I sort/sum in Excel. What is wrong here. Also, what change should I make for ii) years 1991-2010. Thanks.

aiddata <- read.csv("aiddata_research.csv")
sum_by_country <- tapply(aiddata$aid_amount, aiddata$country, sum, na.rm=TRUE) # There are missing data on aid_amount
write.csv(sum_by_country, "sum_by_country.csv")

I have also tried:

sum_by_country <- aggregate(aid_amount ~ country, data = aiddata, sum) instead of tapply.

The first few rows for a few columns look like this:

aiddata_id  year    country                  aid_amount
23229017    2004    Bangladesh               685899.2666
14582630    2000    Bilateral, unspecified   15772.77174
28085216    2006    Bilateral, unspecified   38926.82898
28702455    2006    Bilateral, unspecified   12633.85659
29928104    2006    Cambodia                 955412.9884
27783934    2006    Cambodia                 11773.77268
37418683    2008    Guatemala                40150.7331
94726192    2010    Guatemala                151206.3096

akrun · Accepted Answer

You could use data.table for the big dataset. If you want to get the sum of aid_amount for each country by year

library(data.table)
setkey(setDT(aiddata), country,year)[, 
         list(aid_amount=sum(aid_amount)), by=list(country, year)]

To get the sum of aid_amount for each country

setkey(setDT(aiddata), country)[, 
          list(aid_amount=sum(aid_amount)), by=list(country)]

R - get sum from one column based on categories in another column

Answers (2)

Related Questions