dales
dales

Reputation: 31

How to populate an empty matrix in R from the for loop results?

I am calculating the yearly average NDVI values for all 49681 sites over 25 years. I created a for-loop, but I I can't figure out how to populate an empty 49681 x 25 matrix. My code right now only populates the first column of my matrix. Any suggestions on how to fix this?

A sample of my data

yearly.avg <- matrix (nrow=49681, ncol=25)
for (i in 1:49681) {
yearly.avg[i] <- mean(as.numeric(veg.data[i, 4:603]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,4:27]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,28:51]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,52:75]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,76:99]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,100:123]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,124:147]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,148:171]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,172:195]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,196:219]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,220:243]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,244:267]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,268:291]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,292:315]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,316:339]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,340:363]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,364:387]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,388:411]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,412:435]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,436:459]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,460:483]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,484:507]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,508:531]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,532:555]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,556:579]))
yearly.avg[i] <- mean(as.numeric(veg.data[i,580:603]))
}
head(yearly.avg)

Upvotes: 0

Views: 1682

Answers (2)

Stewart Macdonald
Stewart Macdonald

Reputation: 2132

Based on this answer, you could do something like this:

# make some fake data and set up the structure
siteCount <- 49681
yearCount <- 25
monthCount <- 12
obsPerMonth <- 2
obsCount <- yearCount * monthCount * obsPerMonth
fakeData <- sample(100:500, size=siteCount * obsCount, replace=TRUE)
veg.data <- matrix(fakeData, nrow=siteCount, ncol=obsCount)

sitenum <- sprintf("N%05d", (1:49681)+9000)
lat <- seq(from=40.90952, by=0.08, length.out=length(sitenum))
long <- seq(from=2.755276, by=0.08, length.out=length(sitenum))

veg.data <- as.data.frame(cbind(sitenum, lat, long, veg.data), stringsAsFactors=FALSE)
v <- expand.grid(c('A', 'B'), sprintf("%02d", 1:monthCount), 1982:2006)
dataColNames <- paste('Y', v[, 3], '.', v[, 2], v[, 1], sep='')

colnames(veg.data) <- c('sitenum', 'x', 'y', dataColNames)

###
# we now have the sample data, we can calculate yearly means
###

# First, get a numeric matrix of just the veg data
veg.data2 <- as.matrix(veg.data[, 4:ncol(veg.data)])
storage.mode(veg.data2) <- "numeric"

# change the column headings to be just the year, so that we can average based on year
colnames(veg.data2) <- substring(colnames(veg.data2), 1, 5)

# now, calculate yearly averages
yearly.avg <- sapply(unique(colnames(veg.data2)), function(x) 
      rowMeans(veg.data2[,colnames(veg.data2)== x,drop=FALSE], na.rm=TRUE))

# have a look
head(yearly.avg)
         1982     1983     1984     1985     1986     1987     1988     1989     1990     1991     1992     1993
[1,] 325.2083 363.7500 283.6250 315.6667 289.7500 260.7917 297.0000 301.5833 285.9167 299.2083 264.9167 311.2083
[2,] 307.6250 287.7500 281.3750 296.5833 330.7083 268.2917 331.5417 309.6667 275.7917 300.5833 287.9583 291.2500
[3,] 272.5000 295.9167 302.1250 314.7083 270.6667 340.2917 287.1250 336.3333 309.2500 266.7500 273.5000 254.2917
[4,] 288.9167 280.7083 299.1667 279.5833 301.4583 283.7917 274.6667 295.6250 238.6250 324.7917 302.2083 283.1667
[5,] 280.8750 282.5833 294.7083 276.0417 303.2917 266.5000 324.9583 301.5417 266.2917 327.0417 295.7083 262.7917
[6,] 275.0833 321.5833 305.1250 308.5417 266.7917 304.2083 304.1250 290.1667 312.9167 266.5000 273.7500 314.2917
         1994     1995     1996     1997     1998     1999     2000     2001     2002     2003     2004     2005
[1,] 320.7083 339.9583 288.9167 329.3750 303.6667 290.0417 288.3333 299.0417 290.3333 315.2500 272.5833 303.1667
[2,] 336.7500 295.0000 301.7917 303.0000 294.7917 337.5417 328.1250 284.5417 301.3333 300.6667 302.7083 288.7917
[3,] 314.2083 313.7500 325.0417 290.2917 276.6250 262.7500 315.7500 267.9167 301.8750 312.3333 288.1667 308.5000
[4,] 283.1667 278.8750 300.3333 278.3333 291.7500 358.2500 326.5833 311.7500 248.8750 250.8333 316.5000 324.0417
[5,] 286.9167 290.7500 331.7500 330.2500 317.5417 326.0417 297.8750 307.4583 371.9583 323.9583 320.5833 290.3750
[6,] 290.5000 306.0833 238.0833 304.7083 300.0417 252.3333 261.1250 253.9167 274.2083 282.8750 326.8750 306.1250
         2006
[1,] 308.0000
[2,] 298.4167
[3,] 293.2083
[4,] 308.0417
[5,] 305.6250
[6,] 297.1667

# Manually calculate average for 1982 to check result
d <- as.matrix(veg.data[, 4:27])
storage.mode(d) <- "numeric"
head(rowMeans(d))
[1] 325.2083 307.6250 272.5000 288.9167 280.8750 275.0833

head(rowMeans(d)) == head(yearly.avg[, 'Y1982'])
[1] TRUE TRUE TRUE TRUE TRUE TRUE

Upvotes: 1

Phi
Phi

Reputation: 414

there are better ways to do this other than a for loop, but for starters you're attempting to assign 26 sets of values into 25 columns. You're also literally telling R to populate a single column with an i number of rows, 26 times, with 25 of those times overwriting the previous columns values. I'm also very confused by the ranges you're using as it is a range of length '23'. All that aside and to answer your question as is, you would code a for loop like this :

for(i in 1:49681){
    yearly.avg[i,1] <- mean(as.numeric(veg.data[i,4:603]))
    yearly.avg[i,2] <- mean(as.numeric(veg.data[i,4:27]))
    ...
}

though I can pretty much assure you that there is a better way to accomplish what you're trying to do. A little more info on the data set you're pulling from and exactly the format you want your results in would be needed to help get you to the best possible method.

Upvotes: 1

Related Questions