Reputation: 33
I am new to R and trying to learn on my own. I have data in csv format with 1,048,575 rows and 73 columns. I am looking at three columns - year, country, aid_amount. I want to get the sum of aid_amount by country for i) all years, and ii) for years 1991-2010. I tried the following to get for all years BUT the result I get is different from when I sort/sum in Excel. What is wrong here. Also, what change should I make for ii) years 1991-2010. Thanks.
aiddata <- read.csv("aiddata_research.csv")
sum_by_country <- tapply(aiddata$aid_amount, aiddata$country, sum, na.rm=TRUE) # There are missing data on aid_amount
write.csv(sum_by_country, "sum_by_country.csv")
I have also tried:
sum_by_country <- aggregate(aid_amount ~ country, data = aiddata, sum) instead of tapply.
The first few rows for a few columns look like this:
aiddata_id year country aid_amount
23229017 2004 Bangladesh 685899.2666
14582630 2000 Bilateral, unspecified 15772.77174
28085216 2006 Bilateral, unspecified 38926.82898
28702455 2006 Bilateral, unspecified 12633.85659
29928104 2006 Cambodia 955412.9884
27783934 2006 Cambodia 11773.77268
37418683 2008 Guatemala 40150.7331
94726192 2010 Guatemala 151206.3096
Upvotes: 3
Views: 2953
Reputation: 1
yy=aggregate(df$Column1,by=list(df$Column2),FUN=mean)
Column 2- Categories
on which you want to sum.
If you want to know the maximum value(sum)
among all categories? Use the below code:
which.max(yy$x)
Upvotes: 0
Reputation: 887138
You could use data.table
for the big dataset. If you want to get the sum of aid_amount
for each country
by year
library(data.table)
setkey(setDT(aiddata), country,year)[,
list(aid_amount=sum(aid_amount)), by=list(country, year)]
To get the sum of aid_amount
for each country
setkey(setDT(aiddata), country)[,
list(aid_amount=sum(aid_amount)), by=list(country)]
Upvotes: 5