theamateurdataanalyst
theamateurdataanalyst

Reputation: 2834

Creating a monthly growth rate column for different factors in R

Suppose I have the following data

set.seed(123)

Company <-  c(rep("Company 1",5),rep("Company 2",10))

Dates <- c(seq(as.Date("2014-09-01"), as.Date("2015-01-01"), by="months"),
  seq(as.Date("2011-09-01"), as.Date("2012-06-01"), by="months"))

X.1 <- sample(c(0,0,5,10,15,20,25,30,40,50),size=15,replace=TRUE)
X.2 <- sample(c(0,0,5,10,15,20,25,30,40,50),size=15,replace=TRUE)

df <- data.frame(Dates,Company,X.1,X.2)



       Dates   Company X.1 X.2
1  2014-09-01 Company 1  50   0
2  2014-10-01 Company 1  50   5
3  2014-11-01 Company 1  25  15
4  2014-12-01 Company 1  30   5
5  2015-01-01 Company 1   0  40
6  2011-09-01 Company 2  15   0
7  2011-10-01 Company 2  30  15
8  2011-11-01 Company 2   5  30
9  2011-12-01 Company 2  10   0
10 2012-01-01 Company 2   5  20
11 2012-02-01 Company 2   0   5
12 2012-03-01 Company 2  15   0
13 2012-04-01 Company 2  15  30
14 2012-05-01 Company 2  10  40
15 2012-06-01 Company 2   0  10

What I am trying to do is find monthly growth rates for variables X.1 and X.2 within each company and bind those columns to the right side of the dataframe. The problem here is that the date ranges for each Company are different, which is why I am having trouble with this. Also, since I have 0s in the data Inf and NAs are okay.

Thanks for your help.

Upvotes: 1

Views: 971

Answers (1)

Metrics
Metrics

Reputation: 15458

#I computed the growth using log: growth=log(X(t)/X(t-1)). If you want to compute using (X(t)-X(t-1))/X(t-1), you can just use that. Also, for the first period, it will be NA.

#Assumption: the data are equally spaced for each company. You get Inf, if your last period value is 0 and -Inf if your current period value is 0 (because we used log). The growth will be 0 (when current period is zero) if you don't use log (see second method)

library(dplyr) 
    df%>%
      group_by(Company)%>%
      mutate(gx_1=log(X.1/lag(X.1,1)),gx_2=log(X.1/lag(X.2,1))
     )
    Source: local data frame [15 x 6]
    Groups: Company

            Dates   Company X.1 X.2       gx_1       gx_2
    1  2014-09-01 Company 1   5  40         NA         NA
    2  2014-10-01 Company 1  30   5  1.7917595 -0.2876821
    3  2014-11-01 Company 1  15   0 -0.6931472  1.0986123
    4  2014-12-01 Company 1  40  10  0.9808293        Inf
    5  2015-01-01 Company 1  50  50  0.2231436  1.6094379
    6  2011-09-01 Company 2   0  40         NA         NA
    7  2011-10-01 Company 2  20  25        Inf -0.6931472
    8  2011-11-01 Company 2  40  25  0.6931472  0.4700036
    9  2011-12-01 Company 2  20  50 -0.6931472 -0.2231436
    10 2012-01-01 Company 2  15  25 -0.2876821 -1.2039728
    11 2012-02-01 Company 2  50  30  1.2039728  0.6931472
    12 2012-03-01 Company 2  15  20 -1.2039728 -0.6931472
    13 2012-04-01 Company 2  25  20  0.5108256  0.2231436
    14 2012-05-01 Company 2  20   5 -0.2231436  0.0000000
    15 2012-06-01 Company 2   0   0       -Inf       -Inf

#without using log , i.e.
df%>%
     group_by(Company)%>%
     mutate(gx_1=((X.1-lag(X.1,1))/lag(X.1,1)),gx_2=((X.2-lag(X.2,1))/lag(X.2,1)))

Upvotes: 2

Related Questions