user3493038
user3493038

Reputation: 25

How to do a PCA with netcdf data in R

I have the following netcdf file in R:

"file oceandata.nc has 2 dimensions:"
"lon   Size: 2160"
"lat   Size: 900"
"------------------------"
"file oceandata.nc has 14 variables:"
"float bio1[lon,lat]  Longname:bio1: Annual Mean Temp Missval:1e+30"
"float bio4[lon,lat]  Longname:bio4: Temp Seasonality (standard deviation * 100) Missval:1e+30"
"float bio8[lon,lat]  Longname:bio8: Mean Temp of Wettest Quarter Missval:1e+30"
"float bio9[lon,lat]  Longname:bio9: Mean Temp of Driest Quarter Missval:1e+30"
"float bio10[lon,lat]  Longname:bio10: Mean Temp of Warmest Quarter Missval:1e+30"
"float bio11[lon,lat]  Longname:bio11: Mean Temp of Coldest Quarter Missval:1e+30"
"float bio12[lon,lat]  Longname:bio12: Annual Precipitation Missval:1e+30"
"float bio13[lon,lat]  Longname:bio13: Precipitation of Wettest Month Missval:1e+30"
"float bio14[lon,lat]  Longname:bio14: Precipitation of Driest Month Missval:1e+30"
"float bio15[lon,lat]  Longname:bio15: Precipitation Seasonality (coefficient of variation) Missval:1e+30"
"float bio16[lon,lat]  Longname:bio16: Precipitation of Wettest Quarter Missval:1e+30"
"float bio17[lon,lat]  Longname:bio17: Precipitation of Driest Quarter Missval:1e+30"
"float bio18[lon,lat]  Longname:bio18: Precipitation of Warmest Quarter Missval:1e+30"
"float bio19[lon,lat]  Longname:bio19: Precipitation of Coldest Quarter Missval:1e+30"

I would like to perform a PCA on the 14 variables in the file but am unsure as to how to go about this or if the data needs to be converted to a different format before I can do this.

So far I have done (error message below):

ocean <- open.ncdf("oceandata.nc")

bio1 <- get.var.ncdf(nc=ncdf, varid="bio1")

bio4 <- get.var.ncdf(nc=ncdf, varid="bio4")

bio8 <- get.var.ncdf(nc=ncdf, varid="bio8")

bio9 <- get.var.ncdf(nc=ncdf, varid="bio9")

dim(bio1)

[1] 2160 900

class(bio1)

[1] "matrix"

oceanvars <- cbind(bio1,bio4, bio8, bio9)

colnames(oceanvars) <- c("bio1", "bio4", "bio8", "bio9")

Error in colnames<-(*tmp*, value = c("bio1", "bio4", "bio8", "bio9" : length of 'dimnames' [2] not equal to array extent

pairs(oceanvars)

Error in plot.new() : figure margins too large

pca1 <- princomp(oceanvars, scores=TRUE, cor=TRUE)

Error in princomp.default(oceanvars, scores = TRUE, cor = TRUE) : 'princomp' can only be used with more units than variables

Any suggestions would be much appreciated!

Upvotes: 0

Views: 471

Answers (1)

Beasterfield
Beasterfield

Reputation: 7113

Why are you assuming that cbinding 4 matrices with 900 columns each, results in a matrix object with 4 columns, allowing to assign c("bio1", "bio4", "bio8", "bio9") as column names.

So as far as I understand you have for the four variables Annual Mean Temp, Temp Seasonality, Mean Temp of Wettest Quarter and Mean Temp of Driest Quarter in total 1944000 spatial objects which you want to analyze by a PCA.

Unfortunately you do not provide a reproducible example, but creating oceanvars by

oceanvars <- cbind( c(bio1), c(bio4), c(bio8), c(bio9) )

should already do the trick. The reason is, that c() melts a matrix to a simple vector.

A more general and clean procedure would involve to melt your matrices to 3-column data.frames or at this size even better data.tables and then merge them by the combination of lonand lat and then just provide the value columns as matrix to princomp.

Upvotes: 1

Related Questions