Reputation: 889
I am trying to run a loop and fill a matrix. Here is a sample data:
#generate sample data
reg<-rep(c("a","b","c","d"),each=3)
year<-rep(c(2005:2008),each=3)
sea<-rep(c("Winter","Summer","Autumn"),4)
set.seed(1)
area<-runif(12)
prod<-runif(12)
yld<-runif(12)
dat<-data.frame(reg,year,sea,area,prod,yld)
dat$reg<-as.character(dat$reg)
dat$sea<-as.character(dat$sea)
str(dat)
#create an empty matrix to store my results
results.mat <- matrix(0, ncol = 6, nrow = NROW(unique(dat$reg)))
#create a loop
for (j in unique(sort(dat$reg))){
reg<-dat[dat$reg==j,]
for (k in unique(sort(reg$year))){
year<-reg[reg$year==k,]
results.mat<-year[year$area==max(year$area),]
}}
results.mat
What I am trying to do is for each reg
and for each year
, I want to extract that row where area
is maximum. This implies for a
, row with Autumn
should be selected since area
is maximum among all the three values of area
. Similarly, for b
, row with Winter
should be selected since area
is the maximum. Similarly, for d
, row with Summer
should be selected since area
is maximum.
Therefore the final matrix (or dataframe) should have one row for a
,b
,c
,d
. However when I run my above loop, it only gives me the row for d
and not for the other three. I think this has to do with the last line of loop where I specify it fill the matrix results.mat
and it overwrites the previous selection. But I am not sure for matrix, how should I fill rows by rows.
Thanks
Upvotes: 0
Views: 244
Reputation: 38520
A solution using the data.table
package is as follows:
library(data.table)
setDT(dat)
# subset data according to max area by reg-year
dat[, .SD[which.max(area),], by=c("reg", "year")]
Upvotes: 2
Reputation: 215117
If the result.mat
as you described is what you want, there is a more systematic way of doing it by using some data manipulation package such as dplyr
, which allows you to manipulate data based on groups and filter rows which satisfy some conditions. In dplyr
package, you can achieve the result.mat
in the following way.
library(dplyr);
dat %>% group_by(reg, year) %>% filter(area == max(area))
Source: local data frame [4 x 6]
Groups: reg, year [4]
reg year sea area prod yld
(chr) (int) (chr) (dbl) (dbl) (dbl)
1 a 2005 Autumn 0.5728534 0.7698414 0.01339033
2 b 2006 Winter 0.9082078 0.4976992 0.38238796
3 c 2007 Winter 0.9446753 0.3800352 0.48208012
4 d 2008 Summer 0.2059746 0.6516738 0.82737332
Upvotes: 1