Reputation: 3805
Sample data
set.seed(123)
df <- data.frame(year = c(rep(1980:1994, each = 9), rep(1995, times = 8), rep(1996:2012, each = 9),
rep(2013, times = 7), rep(2014, times = 9)),
ref.doy = sample(120:180, 312, replace = T),
x = rnorm(312))
For each year, if no. of ref.doy
is not 9, then I want to insert additional new rows which is the just the copy of the last row.
For e.g. if for 1995, there are only 8 ref.doy
, I want to copy the 8th row make it 9th row. If 2013, there are only 8 ref.doy
, then I want to copy the 7th row and duplicate it as 8th and 9th row and so on.
The solution I have at the moment is a for-loop:
x <- df %>% group_by(year) %>% dplyr::mutate(y.length = n())
year.vec <- 1980:2014
temp.list <- list()
for(y in seq_along(year.vec)){
yr <- year.vec[y]
temp <- x %>% dplyr::filter(year == yr)
if(unique(temp$y.length) != 9) {
lastrow <- temp[nrow(temp), ]
lastrow.repeat <- as.data.frame(lapply(lastrow, rep, 9 - nrow(temp)))
full.data <- rbind(data.frame(temp), lastrow.repeat)
temp.list[[y]] <- full.data
} else {
temp.list[[y]] <- temp
}
}
newdata <- rbindlist(temp.list)
I need some help to do it within dplyr
or data.table
itself.
Upvotes: 3
Views: 248
Reputation: 388982
Taking your last two year's data which has 5 and 9 entries in 2013 and 2014 respectively as sample. We filter
the groups which has less than 9 rows and for those groups we repeat the last row for 9 - n()
times and add these rows to original dataframe using bind_rows
.
df1 <- tail(df, 14)
library(dplyr)
df1 %>%
bind_rows(df1 %>%
group_by(year) %>%
#suggested by @Henrik
filter(n() < 9) %>%
slice(rep(n(), 9 - n()))) %>%
arrange(year)
# year ref.doy x
#1 2013 126 0.9171749
#2 2013 168 -2.6609228
#3 2013 167 1.1102771
#4 2013 120 -0.4849876
#5 2013 167 0.2306168
#6 2013 167 0.2306168
#7 2013 167 0.2306168
#8 2013 167 0.2306168
#9 2013 167 0.2306168
#10 2014 164 -0.2951578
#11 2014 158 0.8719650
#12 2014 149 -0.3484724
#13 2014 129 0.5185038
#14 2014 120 -0.3906850
#15 2014 147 -1.0927872
#16 2014 150 1.2100105
#17 2014 143 0.7409000
#18 2014 148 1.7242622
Applying this on original dataframe and we check the number of rows for each year
.
df2 <- df %>%
bind_rows(df %>%
group_by(year) %>%
filter(n() < 9) %>%
slice(rep(n(), 9 - n()))) %>%
arrange(year)
df2 %>%
group_by(year) %>%
summarise(no_of_rows = n())
# A tibble: 35 x 2
# year no_of_rows
# <dbl> <int>
# 1 1980 9
# 2 1981 9
# 3 1982 9
# 4 1983 9
# 5 1984 9
# 6 1985 9
# 7 1986 9
# 8 1987 9
# 9 1988 9
#10 1989 9
# ... with 25 more rows
Or as @Henrik mentioned, the most easiest approach would be to take the last row of each group and repeat it 9 - n()
times irrespective.
df %>%
group_by(year) %>%
slice(c(1:n(), rep(n(), 9 - n())))
Upvotes: 2
Reputation: 83215
Using data.table:
library(data.table)
setDT(df)
df[, ri := rowid(year)]
df2 <- df[CJ(year = year, ri = 1:9, unique = TRUE), on = .(year, ri)
][, (2:3) := lapply(.SD, zoo::na.locf), .SDcols = 2:3
][, ri := NULL][]
which gives the desired result:
> df2[year %in% c(1995,2013)] year ref.doy x 1: 1995 160 1.05418102 2: 1995 170 1.14526311 3: 1995 167 -0.57746800 4: 1995 179 2.00248273 5: 1995 146 0.06670087 6: 1995 139 1.86685184 7: 1995 144 -1.35090269 8: 1995 120 0.02098359 9: 1995 120 0.02098359 10: 2013 179 0.43528895 11: 2013 126 0.71517841 12: 2013 126 0.91717492 13: 2013 168 -2.66092280 14: 2013 167 1.11027710 15: 2013 120 -0.48498760 16: 2013 167 0.23061683 17: 2013 167 0.23061683 18: 2013 167 0.23061683
What this does:
df[, ri := rowid(year)]
adds a rownumber by year
CJ(year = year, ri = 1:9, unique = TRUE)
) which has nine rows for each year. The result will now contain empty rows for the years that did not have nine rows.(2:3) := lapply(.SD, zoo::na.locf), .SDcols = 2:3
ri
column as it is no longer needed with ri := NULL
A better alternative as posted by @Henrik in the comments:
df2 <- df[ , .SD[c(1:.N, rep(.N, 9 - .N))], by = year]
Upvotes: 2