Reputation: 61
I have a panel data and as you might notice some of the individuals have missing observations in certain times. For instance "C" is missing data point for 2001 and "D" for 2002 and 2003.
> mydata
id year sales profit
1: A 2000 2000 200
2: A 2001 2050 245
3: A 2002 2100 290
4: A 2003 2150 335
5: B 2000 2200 380
6: B 2001 2250 425
7: B 2002 2300 470
8: B 2003 2350 515
9: C 2000 2400 560
10: C 2002 2500 650
11: C 2003 2550 695
12: D 2000 2600 740
13: D 2001 2650 785
I tried something like below
subset(mydata, year==c(2000:2003)
the result is as presented below.
id year sales profit
1: A 2000 2000 200
2: A 2001 2050 245
3: A 2002 2100 290
4: A 2003 2150 335
5: B 2000 2200 380
6: B 2001 2250 425
7: B 2002 2300 470
8: B 2003 2350 515
9: C 2000 2400 560
Warning message:
In year == c(2000:2003) :
longer object length is not a multiple of shorter object length
What I need is the data that includes entities with full period, starting from beginning, 2000, to the end, 2003. In this case it would be like this.
id year sales profit
1: A 2000 2000 200
2: A 2001 2050 245
3: A 2002 2100 290
4: A 2003 2150 335
5: B 2000 2200 380
6: B 2001 2250 425
7: B 2002 2300 470
8: B 2003 2350 515
Thank you for your time and answer in advance, but I would really appreciate if the answer is a little simpler as I am highly unexperinced and has just begun to learn about R.
Upvotes: 4
Views: 71
Reputation: 42544
For the sake of completeness, here is also a data.table
solution using a join:
mydata[mydata[, uniqueN(year), by = id][V1 == 4L, .(id)], on = "id"]
id year sales profit 1: A 2000 2000 200 2: A 2001 2050 245 3: A 2002 2100 290 4: A 2003 2150 335 5: B 2000 2200 380 6: B 2001 2250 425 7: B 2002 2300 470 8: B 2003 2350 515
Upvotes: 0
Reputation: 107567
Consider base R's ave
to count id groups and keep only records equal to year length of 4:
data
txt = ' id year sales profit
A 2000 2000 200
A 2001 2050 245
A 2002 2100 290
A 2003 2150 335
B 2000 2200 380
B 2001 2250 425
B 2002 2300 470
B 2003 2350 515
C 2000 2400 560
C 2002 2500 650
C 2003 2550 695
D 2000 2600 740
D 2001 2650 785'
df <- read.table(text=txt, header=TRUE)
code
df$grp_cnt <- ave(df$year, df$id, FUN=length)
df <- transform(subset(df, df$grp_cnt == 4), grp_cnt = NULL)
df
# id year sales profit
# 1 A 2000 2000 200
# 2 A 2001 2050 245
# 3 A 2002 2100 290
# 4 A 2003 2150 335
# 5 B 2000 2200 380
# 6 B 2001 2250 425
# 7 B 2002 2300 470
# 8 B 2003 2350 515
Upvotes: 0
Reputation: 193497
You can try something like:
library(data.table)
mydata[, ind := all(2000:2003 %in% year), id][(ind)]
# id year sales profit ind
# 1: A 2000 2000 200 TRUE
# 2: A 2001 2050 245 TRUE
# 3: A 2002 2100 290 TRUE
# 4: A 2003 2150 335 TRUE
# 5: B 2000 2200 380 TRUE
# 6: B 2001 2250 425 TRUE
# 7: B 2002 2300 470 TRUE
# 8: B 2003 2350 515 TRUE
With "tidyverse":
library(tidyverse)
mydata %>%
group_by(id) %>%
filter(all(2000:2003 %in% year))
Sample data (which is how you should share it in the future):
mydata <- structure(list(id = c("A", "A", "A", "A", "B", "B", "B", "B",
"C", "C", "C", "D", "D"), year = c(2000L, 2001L, 2002L, 2003L,
2000L, 2001L, 2002L, 2003L, 2000L, 2002L, 2003L, 2000L, 2001L
), sales = c(2000L, 2050L, 2100L, 2150L, 2200L, 2250L, 2300L,
2350L, 2400L, 2500L, 2550L, 2600L, 2650L), profit = c(200L, 245L,
290L, 335L, 380L, 425L, 470L, 515L, 560L, 650L, 695L, 740L, 785L
)), .Names = c("id", "year", "sales", "profit"), row.names = c(NA,
13L), class = c("data.table", "data.frame"))
Upvotes: 2