Reputation: 9
I have a data frame containing monthly NDVI values from 2000-2012 for 26 stations. I have sorted my dataframe first according to year, then station and lastly ndvi.
My dataframe R looks something like this (sorry about the formatting):
t station year month ndvi altitude precipitation
8 a 2000 aug 0.7793 2143 592.9
9 a 2000 sept 0.7524 2143 135.3
10 a 2000 oct 0.6597 2143 77.5
4 a 2000 apr 0.6029 2143 72.6
7 a 2000 jul 0.6018 2143 606.1
11 a 2000 nov 0.5801 2143 4.4
12 a 2000 dec 0.5228 2143 0
6 a 2000 jun 0.4969 2143 505.9
5 a 2000 may 0.4756 2143 241.7
2 a 2000 feb 0.4396 2143 4
3 a 2000 mar 0.4393 2143 25.5
1 a 2000 jan 0.4138 2143 16
8 b 2000 aug 0.7523 122 832.3
9 b 2000 sept 0.7003 122 229.7
7 b 2000 jul 0.667 122 662
5 b 2000 may 0.6639 122 323.3
4 b 2000 apr 0.593 122 88.6
6 b 2000 jun 0.5508 122 752.1
I need to extract the top three ndvi rows for each station for each year and tried using this code:
top3 <- split(R, R$station)
subsetted.data <- lapply(top3, FUN = function(x) head(x, 3))
subsetted.data
flatten.data <- do.call("rbind", subsetted.data)
View(flatten.data)
However, I then only get a data frame with the top three ndvi rows of stations in 2000 and not the years after.
Does anyone know how I can fix this?
Thank you.
Upvotes: 0
Views: 123
Reputation: 99331
I plugged in some arbitrary "2001" years to show the separation. I prefer to order
the data by the column of interest first, then split
that. You can use do.call(rbind, ...)
on the result if you choose. The result is the top three "ndvi" by station by year.
> dat$year[c(8:12, 16:18)] <- 2001 ## add some 2001 years
> ord <- dat[order(-dat$ndvi), ]
> lapply(split(ord, list(ord$station, ord$year)), head, 3)
$a.2000
t station year month ndvi altitude precipitation
1 8 a 2000 aug 0.7793 2143 592.9
2 9 a 2000 sept 0.7524 2143 135.3
3 10 a 2000 oct 0.6597 2143 77.5
$b.2000
t station year month ndvi altitude precipitation
13 8 b 2000 aug 0.7523 122 832.3
14 9 b 2000 sept 0.7003 122 229.7
15 7 b 2000 jul 0.6670 122 662.0
$a.2001
t station year month ndvi altitude precipitation
8 6 a 2001 jun 0.4969 2143 505.9
9 5 a 2001 may 0.4756 2143 241.7
10 2 a 2001 feb 0.4396 2143 4.0
$b.2001
t station year month ndvi altitude precipitation
16 5 b 2001 may 0.6639 122 323.3
17 4 b 2001 apr 0.5930 122 88.6
18 6 b 2001 jun 0.5508 122 752.1
Upvotes: 0
Reputation: 52637
You need to split by the interaction of station AND year:
R <- R[order(R$ndvi, decreasing=T), ]
top3 <- split(R, interaction(R$station, R$year)) # <<<<<<<<<< this is the change
subsetted.data <- lapply(top3, FUN = function(x) head(x, 3))
subsetted.data
flatten.data <- do.call("rbind", subsetted.data)
This works (see my data at the end). That said, this type of thing is much easier handled with packages like data.table
:
library(data.table)
data.table(R)[order(ndvi, decreasing=T), head(.SD, 3), by=list(station, year)]
Note you can order data.table
s faster by using keys, but I'm omitting that for the sake of clarity here.
Data:
set.seed(1)
R <- expand.grid(year=2000:2010, station=letters[1:5], month=month.abb)
R$ndvi <- runif(nrow(R))
Upvotes: 1