FredErik
FredErik

Reputation: 11

Calculate difference after treatment in R

I have a question regarding panel data in R.

My data basically looks like this:

Year  Name       Variable    Treatment
2000  CompanyA   10          0
2001  CompanyA   10          0
2002  CompanyA   10          1
2003  CompanyA   10          0
2004  CompanyA   12          0
2005  CompanyA   12          0
1999  CompanyB    5          1
2000  CompanyB    5          1
2001  CompanyB    5          0
2002  CompanyB    5          0
2003  CompanyB    6          0
2004  CompanyB    5          0
2005  CompanyB    6          0
2006  CompanyB    6          0

Is there any chance to calculate the difference of the dependent variable before and after the treatment (regarding a certain time lag) in R?

Unfortunately, I only have unbalanced panel data. The purpose of the calculation is to make a dummy variable out of it. This shall show if the dependent variable has grown after two years. Then, I would like to run a clogit regression on it.

Edit

I need to know wether the dependent variable has changed after a treatment, or not. So i need some kind of code which computes a dummy for every positive change regarding my variable.

Output should be something like that:

Year  Name       Variable    Treatment   Dummy
2000  CompanyA   10          0           0
2001  CompanyA   10          0           0
2002  CompanyA   10          1           0
2003  CompanyA   10          0           0
2004  CompanyA   12          0           1
2005  CompanyA   12          0           1
1999  CompanyB    5          1           0
2000  CompanyB    5          1           0
2001  CompanyB    5          0           0
2002  CompanyB    5          0           0
2003  CompanyB    6          0           1
2004  CompanyB    5          0           0
2005  CompanyB    6          0           0
2006  CompanyB    6          0           0

So i can run a conditional logit regression on that and link the treatment (incl. other variables) to the positive effect on my dependent variable after a certain time lag.

Upvotes: 1

Views: 1187

Answers (3)

Martin
Martin

Reputation: 594

Updated the answer according to the clarification in the comment; beyond the simple comparison (on/off treatment, part A) I incorporated an approach for the time course as requested (Part B).
Please note that at many points the code needs to be adapted to the exact question (what to do with tose who become treatment neg, and then possibly even pos again? What is a menaingful duration to anticipate treatment effects since start (or after stop) o ftretment? These questions may be more a conceptual than an R problem, but I tried to provide some starting points how to implement such quetsions.

#### sample data (added and changed some data to demonstarte sorting of the years ####
# and pos Treatment at first time point):

text <- "Year  Name       Variable    Treatment
2000  CompanyA   10          0
2001  CompanyA   10          0
2002  CompanyA   10          1
2003  CompanyA   10          0
2004  CompanyA   12          0
2010  CompanyA   15          1
2005  CompanyA   12          0
1999  CompanyB    5          0
2000  CompanyB    5          1
2001  CompanyB    5          0
2002  CompanyB    5          0
2003  CompanyB    6          0
2004  CompanyB    5          0
2005  CompanyB    6          0
2006  CompanyB    6          0
2001  CompanyC    5          1
2006  CompanyC    9          1"

df <- read.table(text=text, header=TRUE)
str(df)
head(df)

#### A) Simple way: just compare on/off treatment subject ####

mean(df[df$Treatment==1, "Variable"]) - mean(df[df$Treatment==0, "Variable"]) 


#### B) Compare within each company, take into consideration also the time course ####

# split to list according to company names, to analyse them separately
Name.u <- as.character(unique(df$Name))  # unique Company names
L <- sapply(Name.u, function(n) df[df$Name==n, ], simplify=FALSE)             
str(L)
L  # a list of dataframes, one dataframe for each company

## deal with special cases that may influence the concept of theanalysis
# sort for year (assuming there are nor ties)
L <- sapply(Name.u, function(n) L[[n]][order(L[[n]]$Year), ], simplify=FALSE) 
# posibly ignore those who were already treatet at study entry already
L.del <- sapply(Name.u, function(n) ifelse(L[[n]][1, "Treatment"]==1, TRUE, FALSE), simplify=TRUE) 
L[L.del] <- NULL
Name.u <- Name.u[!L.del]
str(L); L # note that CompanyC was deleted because of Treatment==1 at start

## display treatment duration etc.
LL <- function(L.n) {
  L.n$diff <- c(0, diff(L.n$Treatment))
  # stopifnot(sum(L.n$diff!=0) == 1)   # more than one status change - need clarification how this should be handled, see also lines below
  # ALL status change to "treated" (possibly more than one!)
  Rx.start <- which(L.n$diff==1) 
  # duration since FIRST documented treatment
  L.n$RxDurSinceFirst <- L.n$Year - min(L.n$Year[Rx.start])  
  L.n$RxDurReal <- L.n$RxDur
  # need to define what to do with those who are Treatment negative at THIS  time ...
  L.n$RxDurReal[L.n$Treatment==0] <- NA   
  # ... and those who became Treatment neg before or now
  L.n$RxDurReal[sapply(1:nrow(L.n), function(row.i) row.i >= min(which(L.n$diff==-1)))] <- NA  
  return(L.n)
}
str(LL)

# L2 is a new list of the same structure as L, but with more information 
# (more columns in each dataframe element)
L2 <- sapply(Name.u, function(n) LL(L[[n]]), simplify=FALSE)
str(L2)
L2

# for a company n one can then do (and of course further vectorize):
n <- Name.u[1]
str(L2[[n]])
L2[[n]]

# for a company n one can then compare RxDurSinceFirst, RxDurReal or 
# whateveryou want (and of course further vectorize):
(Var.before <- L2[[n]]$Variable[ L2[[n]]$RxDurSinceFirst <  0 ] )
(Var.after  <- L2[[n]]$Variable[ L2[[n]]$RxDurSinceFirst >= 0 ] )
t.test(Var.before, Var.after)  # works of course only if enough observations

# or on/off Treatment within one group, and use the means of each group 
# for further paired t.test/ U-test etc.
(Var.OnRx  <- L2[[n]]$Variable[ L2[[n]]$Treatment ==  0 ] )
(Var.OffRx <- L2[[n]]$Variable[ L2[[n]]$Treatment ==  1 ] )

### End ###

Upvotes: 2

SprengMeister
SprengMeister

Reputation: 580

Here is an answer that I think will get you very close. My code highlights any change in Variable from before the treatment. Note that this is not the most elegant code and more or less a draft version but I have to pack up and I think this may still be useful.

First, here is dput for your table. Just run this to load the table.

dfx <- structure(list(Year = c(2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 
1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L), Name = c("CompanyA", 
"CompanyA", "CompanyA", "CompanyA", "CompanyA", "CompanyA", "CompanyB", 
"CompanyB", "CompanyB", "CompanyB", "CompanyB", "CompanyB", "CompanyB", 
"CompanyB"), Variable = c(10L, 10L, 10L, 10L, 12L, 12L, 5L, 5L, 
5L, 5L, 6L, 5L, 6L, 6L), Treatment = c(0L, 0L, 1L, 0L, 0L, 0L, 
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), Dummy = c(0L, 0L, 0L, 0L, 1L, 
1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L)), .Names = c("Year", "Name", 
"Variable", "Treatment", "Dummy"), class = "data.frame", row.names = c(NA, 
-14L))

I then created a auxiliary variable (has_treatment) that states if a certain year (row) has had treatment. This is the first two rows in this function.

Then follows a simple conditional statement in which I test if a case has had treatment and if Variable differs from Variable before treatment.

foo <- function(dfx){
      dfx[(Position( isTRUE, diff(dfx$Treatment) == -1)+1)  : nrow(dfx), "has_treatment" ] <- 1 

      dfx[1:(Position( isTRUE, diff(dfx$Treatment) == -1))  , "has_treatment" ] <- 0 

      dfx[dfx$has_treatment == 1 & 
              ((dfx[dfx$Treatment == 1, "Variable"] == 
                  dfx[, "Variable"])==FALSE) ,"dummy"] <- 1
  return(dfx)
}

I then run this in ddply. If you are not familiar with ddply and the plyr package, I highly recommend learning about it.

library(plyr)

ddply(test, .variables = "Name", foo   )

Again, this is not exactly what you want but in principle it should get you on the right track. I would try to give it another shot but I have to run.

Also, as some may comment this is not the most elegant way and there are likely faster and more efficient ways.

Anyways, I hope it helps a little.

Upvotes: 0

akrun
akrun

Reputation: 887118

Or,

diff(by(df$Variable, df$Treatment, FUN=mean))
#[1] -1.242424

Upvotes: 1

Related Questions