Reputation: 1304
I am using a database that looks like this:
library(plm)
data("Grunfeld", package = "plm")
library(dplyr)
Grunfeld<-Grunfeld%>%filter(year<=1940)
head(Grunfeld,10)
# firm year inv value capital
# 1 1 1935 317.6 3078.5 2.8
# 2 1 1936 391.8 4661.7 52.6
# 3 1 1937 410.6 5387.1 156.9
# 4 1 1938 257.7 2792.2 209.2
# 5 1 1939 330.8 4313.2 203.4
# 6 1 1940 461.2 4643.9 207.2
# 7 2 1935 209.9 1362.4 53.8
# 8 2 1936 355.3 1807.1 50.5
# 9 2 1937 469.9 2676.3 118.1
# 10 2 1938 262.3 1801.9 260.2
I am trying to create a new variable cap38
based on the following conditions:
1) If the capital of a firm is above 160 in 1938 the variable takes the value of 1 otherwise 0.
2) The variable should identify firms that have capital above 160 in 1938 (not observations). In other words, if the firm meets the first condition, cap38
should score 1 for the firm across all the panel, not just for the year 38.
I am able to obtain the results I want with this approach
# first I identify firms that meet conditions,
sm<-Grunfeld%>%filter(capital>160 & year==1938) %>% mutate(cap38=1) %>%
select(firm,cap38) %>%distinct()
# then I join this data with the original database
Grunfeld<-left_join(Grunfeld,sm)
# finally I replace missing observations with 0s
library(data.table)
setDT(Grunfeld)[is.na(cap38),cap38:=0]
head(Grunfeld,10)
firm year inv value capital cap38
1: 1 1935 317.6 3078.5 2.8 1
2: 1 1936 391.8 4661.7 52.6 1
3: 1 1937 410.6 5387.1 156.9 1
4: 1 1938 257.7 2792.2 209.2 1
5: 1 1939 330.8 4313.2 203.4 1
6: 1 1940 461.2 4643.9 207.2 1
7: 2 1935 209.9 1362.4 53.8 1
8: 2 1936 355.3 1807.1 50.5 1
9: 2 1937 469.9 2676.3 118.1 1
10: 2 1938 262.3 1801.9 260.2 1
This produces the expected result, as both firm 1 and 2 meet the condition they have a score of 1 for every time period. I was wondering whether there is a more efficient way to do so. Ideally, using a data.table approach.
I have tried something like this:
setDT(Grunfeld)[capital>160 & year==1938,cap38_worng:=1
][is.na(cap38_worng),cap38_worng:=0]
head(Grunfeld,10)
firm year inv value capital cap38 cap38_worng
1: 1 1935 317.6 3078.5 2.8 1 0
2: 1 1936 391.8 4661.7 52.6 1 0
3: 1 1937 410.6 5387.1 156.9 1 0
4: 1 1938 257.7 2792.2 209.2 1 1
5: 1 1939 330.8 4313.2 203.4 1 0
6: 1 1940 461.2 4643.9 207.2 1 0
7: 2 1935 209.9 1362.4 53.8 1 0
8: 2 1936 355.3 1807.1 50.5 1 0
9: 2 1937 469.9 2676.3 118.1 1 0
10: 2 1938 262.3 1801.9 260.2 1 1
The problem with this approach is that it only identifies observations that meet my condition but not firms. For instance for firm 1, cap38_worng
is =0 in 1937, while is =1 in 1938
Hopefully, this is an easy question for those that are most skilled. But I am struggling to find an efficient way to solve my problem.
I thank you in advance for your help
best regards
Upvotes: 1
Views: 75
Reputation: 389325
We can check if there is a firm
where capital > 160
when year == 1938
.
We can do this using dplyr
.
library(dplyr)
Grunfeld %>%
group_by(firm) %>%
mutate(cap38 = as.integer(any(capital > 160 & year == 1938)))
Base R :
Grunfeld$cap38 <- +(with(Grunfeld, ave(capital > 160 & year == 1938,
firm, FUN = any)))
and data.table
library(data.table)
setDT(Grunfeld)[, cap38 := as.integer(any(capital > 160 & year == 1938)), firm]
Upvotes: 2
Reputation: 2448
What you need is to check whether there is a case when (capital>160 & year==1938)
at the firm level. Here is one way to do that.
library(plm)
library(tidyverse)
data("Grunfeld", package = "plm")
Grunfeld <- Grunfeld %>%
filter(year <= 1940) %>%
group_by(firm) %>% # grouping by firm
# check if there is the case within the firm with capital > 160 in year 1938
mutate(cap38 = max(capital>160 & year==1938))
head(Grunfeld)
#> # A tibble: 6 x 6
#> # Groups: firm [1]
#> firm year inv value capital cap38
#> <int> <int> <dbl> <dbl> <dbl> <int>
#> 1 1 1935 318. 3078. 2.8 1
#> 2 1 1936 392. 4662. 52.6 1
#> 3 1 1937 411. 5387. 157. 1
#> 4 1 1938 258. 2792. 209. 1
#> 5 1 1939 331. 4313. 203. 1
#> 6 1 1940 461. 4644. 207. 1
Created on 2020-05-21 by the reprex package (v0.3.0)
Upvotes: 0