Reputation: 4080
I am sure this could be simple, but I can't find a solution.. I have a series of buffers for multiple years. In each buffer, I record the extent of forest cover. The forest cover declines over years. In each buffer, I have also a type of management.
I want to add two new columns to my dataset: the one regarding the original forest extent from 2003 for management type 0 and 1. The second relating the rate of forest extent to the original one, from 2003, for two types of management.
Based on these answers, I would assume to use something like: R: fill new columns in data.frame based on row values by condition?
df$area2003<-with(df, area[year == 2003 & manag == 0][match(buff, buff[year== 2003 & manag == 0])])
but it does not work how expected.
My example:
# create data frame
buff<-c(c(rep(c(1:2), each = 2)),
c(rep(c(1:2), each = 2)))
area<-seq(800, 1, by = -100)
year<-rep(2003:2004, each = 4)
manag<-rep(0:1, 4)
# create data frame
df<-data.frame(buff, area, year, manag)
# create values for original forest extent - how to code this???
df$area2003<-with(df, area[year == 2003 & manag == 0][match(distance, distance[year== 2003 & manag == 0])])
df$area2003<-with(df, area[year == 2003 & manag ==1 ][match(distance, distance[year== 2003 & manag == 1])])
# calculate forest rate:
df$rate<-df$area * 100/ df$area2003
What I expect to obtain:
buff area year manag area2003 rate
1 1 800 2003 0 800 100.00
2 1 700 2003 1 700 100.00
3 2 600 2003 0 600 100.00
4 2 500 2003 1 500 100.00
5 1 400 2004 0 800 50.00
6 1 300 2004 1 700 42.86
7 2 200 2004 0 600 33.34
8 2 100 2004 1 500 20.00
Upvotes: 1
Views: 1258
Reputation: 61
Window functions of dplyr
(tidyverse
) are best suited to that kind of transformations. More examples in this vignette.
library(tidyverse)
data_frame(
buff = rep(1:2, 2, each = 2),
area = 8:1*100,
year = rep(2003:2004, each = 4),
manag = rep(0:1, 4)
) %>%
group_by(buff, manag) %>%
arrange(year) %>%
mutate(area2003 = first(area), rate = area/area2003*100)
Upvotes: 1
Reputation: 38500
In this instance, since you aren't trying to update existing variables, you probably would do better to use a good-old fashioned merge
.
dfNew <- merge(df, setNames(df[df$year == 2003, c("buff", "manag", "area")],
c("buff", "manag", "area2003")),
by=c("buff", "manag"))
# calculate change
dfNew$rate <- with(dfNew, 1 - abs(area - area2003) / area2003) * 100
which returns
dfNew
buff manag area year area2003 rate
1 1 0 800 2003 800 100.00000
2 1 0 400 2004 800 50.00000
3 1 1 700 2003 700 100.00000
4 1 1 300 2004 700 42.85714
5 2 0 600 2003 600 100.00000
6 2 0 200 2004 600 33.33333
7 2 1 500 2003 500 100.00000
8 2 1 100 2004 500 20.00000
In the first line, I combined a number of steps, subsetting the data frame to 2003 and the columns of interest andrenaming the variables and in this subset data.frame with setNames
.
To order the data.frame as you have it, use
dfNew <- dfNew[with(dfNew, order(year, buff, manag)), ]
The merge could be extended to additional lines. For example,
dfNew <- merge(df, df[df$year == 2003, c("buff", "manag", "area")],
by=c("buff", "manag"))
names(dfNew) <- c("buff", "manag", "area", "year", "area2003")
Upvotes: 1