Reputation: 4055
I am wondering if there is any easy R commands or packages that will all allow me to easily add variables to data.frames which are the "difference" or change of over time of those variables.
If my data looked like this:
set.seed(1)
MyData <- data.frame(Day=0:9 %% 5+1,
Price=rpois(10,10),
Good=rep(c("apples","oranges"), each=5))
MyData
Day Price Good
1 1 8 apples
2 2 10 apples
3 3 7 apples
4 4 11 apples
5 5 14 apples
6 1 12 oranges
7 2 11 oranges
8 3 9 oranges
9 4 14 oranges
10 5 11 oranges
Then after "first differencing" the price variable, my data would look like this.
Day Price Good P1d
1 1 8 apples NA
2 2 10 apples 2
3 3 7 apples -3
4 4 11 apples 4
5 5 14 apples 3
6 1 12 oranges NA
7 2 11 oranges -1
8 3 9 oranges -2
9 4 14 oranges 5
10 5 11 oranges -3
Upvotes: 6
Views: 9123
Reputation: 43
If the data is unbalanced, examples such as askesis_rea's answer and G. Grothendieck's answer do not apply directly (Note: I did not test the other answers). But they apply after expanding the dataframe to a balanced panel with NA
values.
In this example, individuals are not observed across all time periods.
Let's extend the example to the unbalanced case: (I remove day 2 for apples)
set.seed(1)
mydata <- data.frame(Day=0:9 %% 5+1,
Price=rpois(10,10),
Good=rep(c("apples","oranges"), each=5))
mydata <- mydata[!(mydata$Good=="apples" & mydata$Day==2), ]# removing apples in day 2
mydata
Day Price Good
1 1 8 apples
3 3 7 apples
4 4 11 apples
5 5 14 apples
6 1 12 oranges
7 2 11 oranges
8 3 9 oranges
9 4 14 oranges
10 5 11 oranges
Running G. Grothendieck's dplyr
answer yields the wrong values:
mydata %>%
group_by(Good) %>%
mutate(P1d = Price - lag(Price)) %>%
ungroup
Indeed, for Day 3 and apples, the value should be 2 however it is -1. This is because the difference between prices in Day 3 and 1 was computed rather than the difference between Day 3 and 2.
# A tibble: 9 × 4
Day Price Good P1d
<dbl> <int> <chr> <int>
1 1 8 apples NA
2 3 7 apples -1
3 4 11 apples 4
4 5 14 apples 3
5 1 12 oranges NA
6 2 11 oranges -1
7 3 9 oranges -2
8 4 14 oranges 5
9 5 11 oranges -3
But if we expand first, and then apply first differencing, we get the right results:
library(tidyr)
expanded <- mydata %>% complete(nesting(Good), Day=full_seq(Day, 1))
expanded %>%
group_by(Good) %>%
mutate(P1d = Price - lag(Price)) %>%
ungroup
# A tibble: 10 × 4
Good Day Price P1d
<chr> <dbl> <int> <int>
1 apples 1 8 NA
2 apples 2 NA NA
3 apples 3 7 NA
4 apples 4 11 4
5 apples 5 14 3
6 oranges 1 12 NA
7 oranges 2 11 -1
8 oranges 3 9 -2
9 oranges 4 14 5
10 oranges 5 11 -3
Edit
In some other scenario of varying composition of individuals across time this method might not be appropriate. One good tool is fixest::d operator. It can be used in fixest model formula or with data.table.
Example with data.table
library(data.table)
# creating the data
set.seed(1)
mydata <- data.frame(Day=0:9 %% 5+1,
Price=rpois(10,10),
Good=rep(c("apples","oranges"), each=5))
mydata <- mydata[!(mydata$Good=="apples" & mydata$Day==2), ]# removing apples in day 2
mydata <- fixest::panel(as.data.table(mydata), panel.id=~Good + Day)
mydata[, P1D:=fixest::d(Price)] # Adding inplace first difference
as.data.frame(fixest::unpanel(mydata)) # viewing
Day Price Good P1D
1 1 8 apples NA
2 3 7 apples NA
3 4 11 apples 4
4 5 14 apples 3
5 1 12 oranges NA
6 2 11 oranges -1
7 3 9 oranges -2
8 4 14 oranges 5
9 5 11 oranges -3
set.seed(1)
MyData <- data.frame(Day=0:9 %% 5+1,
Price=rpois(10,10),
Good=rep(c("apples","oranges"), each=5))
MyData <- fixest::panel(as.data.table(MyData), panel.id=~Good + Day)
MyData[, P1D:=fixest::d(Price)]
as.data.frame(fixest::unpanel(MyData))
[1] TRUE
Day Price Good P1D
1 1 8 apples NA
2 2 10 apples 2
3 3 7 apples -3
4 4 11 apples 4
5 5 14 apples 3
6 1 12 oranges NA
7 2 11 oranges -1
8 3 9 oranges -2
9 4 14 oranges 5
10 5 11 oranges -3
Upvotes: 0
Reputation: 1369
collapse::fdiff
is the function you are looking for:
library(collapse)
# This means compute difference of Price lagged once, iterated once, by Good, ordered by Day
settransform(MyData, P1d = fdiff(Price, 1, 1, Good, Day))
Upvotes: 0
Reputation: 21
In my case, I had to generate first differences for a panel. In order to have the differenced vector with the same length I used diff with a NA.
library(dplyr)
mydata %>%
group_by(id) %>%
mutate(price_diff = c(NA, diff(price)))%>%
ungroup()
Upvotes: 1
Reputation: 269654
ave
transform(MyData, P1d = ave(Price, Good, FUN = function(x) c(NA, diff(x))))
ave/gsubfn
The last solution can be shorteneed slightly using fn$
in the gsubfn package:
library(gsubfn)
transform(MyData, P1d = fn$ave(Price, Good, FUN = ~ c(NA, diff(x))))
dplyr
library(dplyr)
MyData %>%
group_by(Good) %>%
mutate(P1d = Price - lag(Price)) %>%
ungroup
data.table
library(data.table)
dt <- data.table(MyData)
dt[, P1d := c(NA, diff(Price)), by = Good]
dplyr now uses %>%
instead of %.%
.
Upvotes: 12
Reputation: 1
I came up with this code following some dpylr tutorials online: My goal was to add new column that is 5 year growth rate of R & D (variable xrd) of firms (identifer GVKEY). RandD2015 is raw data file.
By using the piping function (%>%) you can combine multiple calls in dplyr. Hope this is useful (my 1st code contribution in Stack Overflow)
library(dplyr)
RandDtest<- RandDec2015 %>%
group_by(GVKEY) %>%
mutate(xrd5yr=xrd/lag(xrd,4)-1)
Upvotes: 0
Reputation: 2022
One can easily do it like this:
library(reshape2)
library(dplyr)
MyNewData <-
MyData %.%
melt(id = c("Good", "Day")) %.%
dcast(Day ~ Good) %.%
mutate(apples = apples - lag(apples),
oranges = oranges - lag(oranges)) %.%
melt(id = "Day", variable.name = "Good", value.name = "P1d") %.%
merge(MyData) %.%
arrange(Good, Day)
Regards
Upvotes: 2
Reputation: 4055
This is what I came up with. But it does not seem efficient in the least:
MyData$P1d <- c(NA, MyData$Price[-1]-MyData$Price[-nrow(MyData)])
MyData$P1d[c(F,MyData$Good[-1]!=MyData$Good[-nrow(MyData)])] <- NA
MyData
Day Price Good P1d
1 1 8 apples NA
2 2 10 apples 2
3 3 7 apples -3
4 4 11 apples 4
5 5 14 apples 3
6 1 12 oranges NA
7 2 11 oranges -1
8 3 9 oranges -2
9 4 14 oranges 5
10 5 11 oranges -3
Upvotes: 0