maycca
maycca

Reputation: 4080

R: create new columns, base on rows and column combination

I am sure this could be simple, but I can't find a solution.. I have a series of buffers for multiple years. In each buffer, I record the extent of forest cover. The forest cover declines over years. In each buffer, I have also a type of management.

I want to add two new columns to my dataset: the one regarding the original forest extent from 2003 for management type 0 and 1. The second relating the rate of forest extent to the original one, from 2003, for two types of management.

Based on these answers, I would assume to use something like: R: fill new columns in data.frame based on row values by condition?

df$area2003<-with(df, area[year == 2003 & manag == 0][match(buff, buff[year== 2003 & manag == 0])])

but it does not work how expected.

My example:

# create data frame
buff<-c(c(rep(c(1:2), each = 2)), 
            c(rep(c(1:2), each = 2)))
area<-seq(800, 1, by = -100) 
year<-rep(2003:2004, each = 4)
manag<-rep(0:1, 4)

# create data frame
df<-data.frame(buff, area, year, manag)

# create values for original forest extent - how to code this???
df$area2003<-with(df, area[year == 2003 & manag == 0][match(distance, distance[year== 2003 & manag == 0])])
df$area2003<-with(df, area[year == 2003 & manag ==1 ][match(distance, distance[year== 2003 & manag == 1])])

# calculate forest rate:
df$rate<-df$area * 100/ df$area2003

What I expect to obtain:

  buff area year manag area2003   rate
1    1  800 2003     0      800 100.00
2    1  700 2003     1      700 100.00
3    2  600 2003     0      600 100.00
4    2  500 2003     1      500 100.00
5    1  400 2004     0      800  50.00
6    1  300 2004     1      700  42.86
7    2  200 2004     0      600  33.34
8    2  100 2004     1      500  20.00

Upvotes: 1

Views: 1258

Answers (2)

Jarosław Nirski
Jarosław Nirski

Reputation: 61

Window functions of dplyr (tidyverse) are best suited to that kind of transformations. More examples in this vignette.

library(tidyverse)

data_frame(
    buff = rep(1:2, 2, each = 2),
    area = 8:1*100,
    year = rep(2003:2004, each = 4),
    manag = rep(0:1, 4)
    ) %>%
    group_by(buff, manag) %>%
    arrange(year) %>%
    mutate(area2003 = first(area), rate = area/area2003*100)

Upvotes: 1

lmo
lmo

Reputation: 38500

In this instance, since you aren't trying to update existing variables, you probably would do better to use a good-old fashioned merge.

dfNew <- merge(df, setNames(df[df$year == 2003, c("buff",  "manag", "area")],
                            c("buff",  "manag", "area2003")),
               by=c("buff",  "manag"))

# calculate change
dfNew$rate <- with(dfNew, 1 - abs(area - area2003) / area2003) * 100

which returns

dfNew
  buff manag area year area2003      rate
1    1     0  800 2003      800 100.00000
2    1     0  400 2004      800  50.00000
3    1     1  700 2003      700 100.00000
4    1     1  300 2004      700  42.85714
5    2     0  600 2003      600 100.00000
6    2     0  200 2004      600  33.33333
7    2     1  500 2003      500 100.00000
8    2     1  100 2004      500  20.00000

In the first line, I combined a number of steps, subsetting the data frame to 2003 and the columns of interest andrenaming the variables and in this subset data.frame with setNames.

To order the data.frame as you have it, use

dfNew <- dfNew[with(dfNew, order(year, buff, manag)), ]

The merge could be extended to additional lines. For example,

dfNew <- merge(df, df[df$year == 2003, c("buff",  "manag", "area")],
               by=c("buff",  "manag"))

names(dfNew) <- c("buff",  "manag", "area", "year", "area2003")

Upvotes: 1

Related Questions