user53020
user53020

Reputation: 889

fill a matrix by row in a loop

I am trying to run a loop and fill a matrix. Here is a sample data:

#generate sample data
reg<-rep(c("a","b","c","d"),each=3)
year<-rep(c(2005:2008),each=3)
sea<-rep(c("Winter","Summer","Autumn"),4)
set.seed(1)
area<-runif(12)
prod<-runif(12)
yld<-runif(12)
dat<-data.frame(reg,year,sea,area,prod,yld)
dat$reg<-as.character(dat$reg)
dat$sea<-as.character(dat$sea)
str(dat)

#create an empty matrix to store my results
results.mat <- matrix(0, ncol = 6, nrow = NROW(unique(dat$reg)))

#create a loop
for (j in unique(sort(dat$reg))){
reg<-dat[dat$reg==j,]
for (k in unique(sort(reg$year))){
  year<-reg[reg$year==k,]
  results.mat<-year[year$area==max(year$area),]
}}
results.mat

What I am trying to do is for each reg and for each year, I want to extract that row where area is maximum. This implies for a, row with Autumn should be selected since area is maximum among all the three values of area. Similarly, for b, row with Winter should be selected since area is the maximum. Similarly, for d, row with Summer should be selected since area is maximum.

Therefore the final matrix (or dataframe) should have one row for a,b,c,d. However when I run my above loop, it only gives me the row for d and not for the other three. I think this has to do with the last line of loop where I specify it fill the matrix results.mat and it overwrites the previous selection. But I am not sure for matrix, how should I fill rows by rows.

Thanks

Upvotes: 0

Views: 244

Answers (2)

lmo
lmo

Reputation: 38520

A solution using the data.table package is as follows:

library(data.table)
setDT(dat)

# subset data according to max area by reg-year
dat[, .SD[which.max(area),], by=c("reg", "year")]

Upvotes: 2

akuiper
akuiper

Reputation: 215117

If the result.mat as you described is what you want, there is a more systematic way of doing it by using some data manipulation package such as dplyr, which allows you to manipulate data based on groups and filter rows which satisfy some conditions. In dplyr package, you can achieve the result.mat in the following way.

library(dplyr);
dat %>% group_by(reg, year) %>% filter(area == max(area))

Source: local data frame [4 x 6]
Groups: reg, year [4]

    reg  year    sea      area      prod        yld
  (chr) (int)  (chr)     (dbl)     (dbl)      (dbl)
1     a  2005 Autumn 0.5728534 0.7698414 0.01339033
2     b  2006 Winter 0.9082078 0.4976992 0.38238796
3     c  2007 Winter 0.9446753 0.3800352 0.48208012
4     d  2008 Summer 0.2059746 0.6516738 0.82737332

Upvotes: 1

Related Questions