Reputation: 181
I would like to create new rows in a data.frame for all missing years for each group (firm and type). The dataframe looks as follows:
minimal <- data.frame(firm = c("A","A","A","B","B","B","A","A","A","B","B","B"),
type = c("X","X","X","X","X","X","Y","Y","Y","Y","Y","Y"),
year = c(2000,2004,2007,2010,2008,2001,2002,2003,2007,2000,2001,2008),
value = c(1,3,7,9,9,2,3,3,7,5,9,15)
)
Dataframe:
firm type year value
A X 2000 1
A X 2004 3
A X 2007 7
B X 2010 9
B X 2008 9
B X 2001 2
A Y 2002 3
A Y 2003 3
A Y 2007 7
B Y 2000 5
B Y 2001 9
B Y 2008 15
Now, what I want to get is the following: I can see in the data that the minimum year is 2000 and the maximum is 2010. I want to add a row for each missing year for each combination of firm-type. E.g. for firm A and type X, I would like to add rows such that it looks like this:
Final output:
firm type year value
A X 2000 1
A X 2004 3
A X 2007 7
A X 2001 1
A X 2002 1
A X 2003 1
A X 2005 3
A X 2006 3
A X 2008 7
A X 2009 7
A X 2010 7
Additionally, I want to write the value from the previous year into the column 'value' for the missing row for all subsequent years until a new non-missing row appears (as seen in the final output example).
I have not yet come up with any useful code, but what I have found so far is the following which might be the right direction:
setDT(minimal)[, .SD[match(2000:2010, year)],
by = c("firm","type")]
I don't really understand the concept of setDT and .SD, but this creates at least one row for each firm type combination. However, there is not content for year.
Thanks a lot in advance!
Upvotes: 3
Views: 2361
Reputation: 38510
Here is a data.table
solution.
library(data.table)
dt <- setDT(minimal)[CJ(firm=firm, type=type, year=seq(min(year), max(year)), unique=TRUE),
on=.(firm, type, year), roll=TRUE]
This returns
head(dt, 15)
firm type year value
1: A X 2000 1
2: A X 2001 1
3: A X 2002 1
4: A X 2003 1
5: A X 2004 3
6: A X 2005 3
7: A X 2006 3
8: A X 2007 7
9: A X 2008 7
10: A X 2009 7
11: A X 2010 7
12: A Y 2000 NA
13: A Y 2001 NA
14: A Y 2002 3
15: A Y 2003 3
Notice that the initial rows of the second firm-type combo are NA. If you want to fill these in with the subsequent year, you can adjust the argument of fill to "nearest", though this could effect the values in the middle of the data.
Upvotes: 0
Reputation: 51592
I could not find an exact dupe for this so here is a possible solution,
library(dplyr)
library(tidyr)
minimal %>%
group_by(firm, type) %>%
complete(year = full_seq(2000:2010, 1)) %>%
fill(value)
Upvotes: 4
Reputation: 85
I wrote this code that do what you wanted, maybe it's not so efficient or elegant but it works:
# Input dataframe
minimal <- data.frame(firm = c("A","A","A","B","B","B","A","A","A","B","B","B"),
type = c("X","X","X","X","X","X","Y","Y","Y","Y","Y","Y"),
year = c(2000,2004,2007,2010,2008,2001,2002,2003,2007,2000,2001,2008),
value = c(1,3,7,9,9,2,3,3,7,5,9,15)
)
# Sorting is needed
minimal = minimal[order(minimal$firm, minimal$type, minimal$year),]
# Variables used
table = table(minimal$firm=="A", minimal$type=="X")
minYear = min(minimal$year)
maxYear = max(minimal$year)
startPos = 0
# Iterates the dataframe
for(i in 1:2){
for(j in 1:2){
prevValue = 0
currYear = minYear
# Adds minimum year if needed
if(minimal$year[1+startPos] != currYear){
newRow = c(as.character(minimal$firm[1+startPos]), as.character(minimal$type[1+startPos]), currYear, prevValue)
minimal = rbind(minimal, newRow)
}
# Adds years
for(k in (1+startPos):(table[i,j]+startPos)){
if(minimal$year[k]!=currYear){
currYear = currYear + 1
while(minimal$year[k]!=currYear){
newRow = c(as.character(minimal$firm[k]), as.character(minimal$type[k]), currYear, prevValue)
minimal = rbind(minimal, newRow)
currYear = currYear + 1
}
}
prevValue = minimal$value[k]
}
# Adds years from last to maximum
if(currYear < maxYear){
for(l in 1:(maxYear - currYear)){
newRow = c(as.character(minimal$firm[k]), as.character(minimal$type[k]), currYear+l, prevValue)
minimal = rbind(minimal, newRow)
}
}
startPos = startPos + table[i,j]
}
}
# Result
minimal = minimal[order(minimal$firm, minimal$type, minimal$year),]
minimal
Upvotes: 0