Reputation: 329
I am very beginner in R. and I totally struck with this problem. you can download the netCDF file from link below to take a look.
https://drive.google.com/file/d/0ByY3OAw62EShbkF6VWNFUkRYMmM/view?usp=sharing
^This is my NetCDF atmospheric data file with 8 variable and 8 dimension. Here,my variables of interest are:
TIMSID is number of site (its include urban site, rural site etc.)
URBAN :: number of urban sites [urban is 3 row 250 column matrix. row1 is number of urban sites , row2 is latidude, row 3 is longitude.]
TIME :: data was collected from 1 march 2012 to may 2012 [encoding for 'time' is YYYYMMDDHH]
PM10 :: hourly particulate matter concentration measured at every station of every site
I need to work with only these 4 variables from this large data set.
I have to separate the data of PM10 values only at urban sites only for "1 march 2012". (Actually I need to find in TIMSID variable which sites are urban sites and match the corresponding PM10 value for urban sites only for 01 march 2012.)
For example, in TIMSID, different type of sites exist like urban, rural etc named 111121,111122,111123,111124 but urban site number are 111121,111123..etc so i have to consider only urban site from TIMSID data and want to match the corresponding pm10 value, time, latitude, longitude. And then finally wanna make a new dataset.
The final table/dataset should be ~ column1-time(only 1 march 2012),column2-urban sites number, Column (3,4)-latitude and longitude of corresponding urban sites, Column 5- hourly pm10 value in each urban sites
I have used these following command to read the data from NetCDF file. But I can't understand what should I do further...
install.packages("ncdf",dependencies=TRUE)
library(ncdf)
nc<-open.ncdf("2012_03_05_PM10_surface.nc")
print(nc)
tmsid<-get.var.ncdf(nc,"TMSID")
timsid
urban<-get.var.ncdf(nc,"urban")
urban
time<-get.var.ncdf(nc,"TIME")
pm10<-get.var.ncdf(nc,"PM10")
As I am beginner in R so I only know the basic commands. I can't figure out, which specific package I should learn to solve this problem. Please help me out? Thanks in advance for your valuable time. If you need any further information please ask me anytime.
Upvotes: 0
Views: 480
Reputation: 18749
library(ncdf)
nc <- open.ncdf("2012_03_05_PM10_surface.nc")
tmsid <- get.var.ncdf(nc,"TMSID")
urban <- get.var.ncdf(nc,"urban")
time <- get.var.ncdf(nc,"TIME")
pm10 <- get.var.ncdf(nc,"PM10")
First let's have a look at nc
:
[1] "file ~/Downloads/2012_03_05_PM10_surface.nc has 8 dimensions:"
[1] "data_num Size: 683016"
[1] "ncl1 Size: 683016"
[1] "obsnum_urban Size: 250"
[1] "ID_LAT_LON Size: 3"
[1] "obsnum_road Size: 33"
[1] "obsnum_background Size: 5"
[1] "obsnum_rural Size: 16"
[1] "ncl7 Size: 683016"
[1] "------------------------"
[1] "file ~/Downloads/2012_03_05_PM10_surface.nc has 8 variables:"
[1] "int TMSID[data_num] Longname:TMSID Missval:NA"
[1] "int TIME[ncl1] Longname:TIME Missval:NA"
[1] "float PM10[data_num] Longname:PM10 Missval:1e+30"
[1] "float urban[ID_LAT_LON,obsnum_urban] Longname:urban Missval:1e+30"
[1] "float road[ID_LAT_LON,obsnum_road] Longname:road Missval:1e+30"
[1] "float background[ID_LAT_LON,obsnum_background] Longname:background Missval:1e+30"
[1] "float rural[ID_LAT_LON,obsnum_rural] Longname:rural Missval:1e+30"
[1] "int TMS_JULIAN[ncl7] Longname:TMS_JULIAN Missval:NA"
What it tells us is that urban
's rows are ID, latitude and longitude. Then we have tmsid
giving the vector of IDs of same size as the vector of time
: one per data_num
, i. e. one couple ID-time per datapoint in PM10
, meaning we will be able to subset pm10
by IDs (which are given by the first row of urban
) and by timestamp (from 2012030101 to 2012030124).
# First we need to make a dataframe out of urban, for convenience.
urban <- as.data.frame(t(urban))
colnames(urban) <- c("ID", "LAT", "LON")
# Then we do the subsetting using a lapply, so we can batch-subset:
res <- lapply(urban$ID,
function(x)data.frame(ID=x,
pm=pm10[tmsid%in%x & time%in%2012030101:2012030124],
time=2012030101:2012030124))
# Which gives us a list of sub-dataframes that we want to compress back into a single dataframe:
res <- do.call(rbind,res)
# Finally we merge that with the original urban dataframe
# so that each entry has its own LAT and LON:
res <- merge(res, urban, by="ID")
res
# ID pm time LAT LON
#1 111121 42 2012030101 37.56464 126.9760
#2 111121 36 2012030102 37.56464 126.9760
#3 111121 46 2012030103 37.56464 126.9760
#4 111121 40 2012030104 37.56464 126.9760
#5 111121 36 2012030105 37.56464 126.9760
#...
#5995 831154 81 2012030119 37.52662 126.8064
#5996 831154 72 2012030120 37.52662 126.8064
#5997 831154 81 2012030121 37.52662 126.8064
#5998 831154 70 2012030122 37.52662 126.8064
#5999 831154 74 2012030123 37.52662 126.8064
#6000 831154 74 2012030124 37.52662 126.8064
250 urban sites X 24 hours = 6 000 datapoints, which is indeed what we get here.
Upvotes: 1