Vint
Vint

Reputation: 499

Adding missing rows to a data frame, filling in data from last known data point

I have a data frame that has missing rows. You can identify the missing rows by looking at the gaps in the sequence.

Count<-c(1,1,1,1,2,2,2,3,3,4,4,4,4,5,5,6,6,6)
Seq<-c(1,2,3,4,1,2,4,1,4,1,2,3,5,1,3,1,2,3)
MyData<-c(5,4,5,3,4,3,2,1,2,1,3,2,4,2,3,1,4,3)
DF1<-data.frame(Count,Seq,MyData)
DF1

The count keeps track of the sequence number, and the sequences will always run as a numeric sequence. In this case it is 1:5, but this can vary so I don't want to hard code this limitation.

My goal is to create two new data frame that contains all of the missing sequence rows. The first would have NA in the data column for the added "missing' rows.

Count2<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6)
Seq2<-c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
MyData2<-c(5,4,5,3,NA,4,3,NA,2,NA,1,NA,NA,2,NA,1,3,2,NA,4,2,NA,3,NA,NA,1,4,3,NA,NA)
DF2<-data.frame(Count2,Seq2,MyData2)
DF2

The second data frame would be similar, but would contain the last known data point for that Sequence number.

Count2<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6)
Seq2<-c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
MyData3<-c(5,4,5,3,NA,4,3,5,2,NA,1,3,5,2,NA,1,3,2,2,4,2,3,3,2,4,1,4,3,2,4)
DF3<-data.frame(Count2,Seq2,MyData3)
DF3

Note- the NA remains for the 5th element in the sequence for this first 3 counts as there was no initial value.

Upvotes: 1

Views: 252

Answers (2)

Maurits Evers
Maurits Evers

Reputation: 50668

Here is a solution using

  1. base R's merge on a "skeleton" dataframe for the first question, and
  2. zoo::na.locf to replace NAs with last known values for the second question.

As per OP's request, the maximum Seq and Count values are inferred from the data.

# These are the maximum seq and count numbers from the data
maxSeq <- max(DF1$Seq);
maxCts <- max(DF1$Count);

# Replicating DF1
# Construct "skeleton" dataframe with appropriate Seq and Count sequences 
df.one <- data.frame(
    Count = rep(seq(1:maxCts), each = maxSeq),
    Seq = rep(seq(1:maxSeq), maxCts)
);
# Merge with source data, and put NAs for missing entries
df.one <- merge(df.one, DF1, all = TRUE);
tail(df.one)
#   Count Seq MyData
#25     5   5     NA
#26     6   1      1
#27     6   2      4
#28     6   3      3
#29     6   4     NA
#30     6   5     NA

# Replicating DF2
# Split on Seq, replace NAs in MyData with last known value, 
# and rbind into dataframe
df.two <- do.call(rbind.data.frame, lapply(split(df.two, df$Seq), function(x)  {
    x$MyData <- na.locf(x$MyData);
    return(x);
}))
# Sort by Count then Seq
df.two <- df.two[order(df.two$Count, df.two$Seq), ];
rownames(df.two) <- NULL;
tail(df.two);
#   Count Seq MyData
#25     5   5      3
#26     6   1      1
#27     6   2      4
#28     6   3      3
#29     6   4      3
#30     6   5      2

Data

Count<-c(1,1,1,1,2,2,2,3,3,4,4,4,4,5,5,6,6,6)
Seq<-c(1,2,3,4,1,2,4,1,4,1,2,3,5,1,3,1,2,3)
MyData<-c(5,4,5,3,4,3,2,1,2,1,3,2,4,2,3,1,4,3)
DF1<-data.frame(Count,Seq,MyData)

Upvotes: 1

www
www

Reputation: 39154

A solution using dplyr and tidyr.

library(dplyr)
library(tidyr)

DF2 <- DF1 %>%
  complete(Count, Seq = full_seq(Seq, period = 1)) %>%
  arrange(Count, Seq)

DF3 <- DF2 %>%
  arrange(Seq, Count) %>%
  group_by(Seq) %>%
  fill(MyData) %>%
  arrange(Count) %>%
  ungroup()

DF2
# # A tibble: 30 x 3
#    Count   Seq MyData
#    <dbl> <dbl>  <dbl>
#  1     1     1      5
#  2     1     2      4
#  3     1     3      5
#  4     1     4      3
#  5     1     5     NA
#  6     2     1      4
#  7     2     2      3
#  8     2     3     NA
#  9     2     4      2
# 10     2     5     NA
# # ... with 20 more rows

DF3
# # A tibble: 30 x 3
#    Count   Seq MyData
#    <dbl> <dbl>  <dbl>
#  1     1     1      5
#  2     1     2      4
#  3     1     3      5
#  4     1     4      3
#  5     1     5     NA
#  6     2     1      4
#  7     2     2      3
#  8     2     3      5
#  9     2     4      2
# 10     2     5     NA
# # ... with 20 more rows

Upvotes: 2

Related Questions