Reputation: 95
I have the folowing data frame:
>dados
COUNTRY Year CO2 emissions Pop. Growth(%)
Argentina 1994 1.23 0.3
Argentina 1995 1.26 0.2
Argentina 1996 1.28 0.4
Argentina 1997 1.24 0.2
Brazil 1994 1.54 0.7
Brazil 1995 1.59 0.6
Brazil 1996 1.60 0.9
Brazil 1997 1.58 1.3
And I'd like to first difference the variables CO2 emissions
and Pop. Growth(%)
for each country. I've already tried the function dados[,2:4] <- diff(dados[,2:4])
but it's returned the error:
"Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] : non-numeric argument to binary operator"
Upvotes: 2
Views: 3374
Reputation: 18681
Here's with dplyr
:
library(dplyr)
df %>%
group_by(COUNTRY) %>%
mutate_at(vars(CO2_emissions:Pop_Growth), funs(.-lag(.)))
Edit: As of dplyr 0.8.0
, funs()
is soft deprecated. Use the following instead for newer versions of dplyr
df %>%
group_by(COUNTRY) %>%
mutate_at(vars(CO2_emissions:Pop_Growth), list(~ .x - lag(.x)))
Output:
# A tibble: 8 x 4
# Groups: COUNTRY [2]
COUNTRY Year CO2_emissions Pop_Growth
<fct> <int> <dbl> <dbl>
1 Argentina 1994 NA NA
2 Argentina 1995 0.03 -0.100
3 Argentina 1996 0.02 0.2
4 Argentina 1997 -0.04 -0.2
5 Brazil 1994 NA NA
6 Brazil 1995 0.05 -0.100
7 Brazil 1996 0.01 0.3
8 Brazil 1997 -0.02 0.4
Data:
df = structure(list(COUNTRY = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L), .Label = c("Argentina", "Brazil"), class = "factor"),
Year = c(1994L, 1995L, 1996L, 1997L, 1994L, 1995L, 1996L,
1997L), CO2_emissions = c(1.23, 1.26, 1.28, 1.24, 1.54, 1.59,
1.6, 1.58), Pop_Growth = c(0.3, 0.2, 0.4, 0.2, 0.7, 0.6,
0.9, 1.3)), .Names = c("COUNTRY", "Year", "CO2_emissions",
"Pop_Growth"), class = "data.frame", row.names = c(NA, -8L))
Upvotes: 5