Reputation: 745

Fill in missing rows in R

Suppose I have a data frame which looks like this

ID  A   B   C   D   Month
1   X   M   5   1   3
1   X   K   4   2   4
1   X   K   3   7   5
1   X   K   2   6   6
2   Y   L   5   8   1
2   Y   L   2   3   2
2   Y   M   5   1   3
2   Y   K   2   7   5
2   Y   M   2   8   6
3   Z   K   5   3   1
3   Z   M   6   3   2
3   Z   M   5   8   3
3   Z   K   4   2   4

In this data ID and A are unique variables,
while B,C,D,Month can change their value
Month has 6 factor values from 1 to 6
B have 3 factor value from K,L,M
C,D can have any value.

I want this data to become like this

ID  A   B   C   D   Month
1   X   0   0   0   1
1   X   0   0   0   2
1   X   M   5   1   3
1   X   K   4   2   4
1   X   K   3   7   5
1   X   K   2   6   6
2   Y   L   5   8   1
2   Y   L   2   3   2
2   Y   M   5   1   3
2   Y   0   0   0   4
2   Y   K   2   7   5
2   Y   M   2   8   6
3   Z   K   5   3   1
3   Z   M   6   3   2
3   Z   M   5   8   3
3   Z   K   4   2   4
3   Z   0   0   0   5
3   Z   0   0   0   6

It should fill in the missing rows by keeping the unique variables values same and filling in the varying ones with zero.

I can use zoo library to fill in the missing values but how to fill in the complete missing rows?

Upvotes: 0

Answers (2)

Whitebeard

Reputation: 6213

Here's a way using base R

frame <- expand.grid(ID = unique(dat$ID), Month = 1:6)
dat2 <- merge(dat, frame, by=c("ID", "Month"), all=TRUE)[, union(names(dat), names(frame))]
levels(dat2$B) <- c(levels(dat2$B), 0)

res <- lapply(split(dat2, dat2$ID), function(x) {
  x$A[which(is.na(x$A))] <- unique(x$A)[!is.na(unique(x$A))]
  x[is.na(x)] <- 0
  x
})

do.call(rbind, res)
     ID A B C D Month
1.1   1 X 0 0 0     1
1.2   1 X 0 0 0     2
1.3   1 X M 5 1     3
1.4   1 X K 4 2     4
1.5   1 X K 3 7     5
1.6   1 X K 2 6     6
2.7   2 Y L 5 8     1
2.8   2 Y L 2 3     2
2.9   2 Y M 5 1     3
2.10  2 Y 0 0 0     4
2.11  2 Y K 2 7     5
2.12  2 Y M 2 8     6
3.13  3 Z K 5 3     1
3.14  3 Z M 6 3     2
3.15  3 Z M 5 8     3
3.16  3 Z K 4 2     4
3.17  3 Z 0 0 0     5
3.18  3 Z 0 0 0     6

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1

Reputation: 193677

Maybe something like this would work for your needs:

library(dplyr)

mydf %>%
  full_join(expand.grid(ID = unique(mydf$ID), Month = 1:6)) %>%
  group_by(ID) %>%
  mutate(A = replace(A, is.na(A), unique(na.omit(A)))) %>%
  arrange(ID, A, Month) %>%
  replace(., is.na(.), 0)
# Joining by: c("ID", "Month")
# Source: local data frame [18 x 6]
# Groups: ID
# 
#    ID A B C D Month
# 1   1 X 0 0 0     1
# 2   1 X 0 0 0     2
# 3   1 X M 5 1     3
# 4   1 X K 4 2     4
# 5   1 X K 3 7     5
# 6   1 X K 2 6     6
# 7   2 Y L 5 8     1
# 8   2 Y L 2 3     2
# 9   2 Y M 5 1     3
# 10  2 Y 0 0 0     4
# 11  2 Y K 2 7     5
# 12  2 Y M 2 8     6
# 13  3 Z K 5 3     1
# 14  3 Z M 6 3     2
# 15  3 Z M 5 8     3
# 16  3 Z K 4 2     4
# 17  3 Z 0 0 0     5
# 18  3 Z 0 0 0     6

Upvotes: 1

Fill in missing rows in R

Answers (2)

Related Questions