Gayatri
Gayatri

Reputation: 73

Inserting rows for missing data in R - with 0 quantity

ORIGINAL DATA

Person   month   Year Quantity
A         02  2018   900
A         04  2018   600
A         09  2018   300
A         04  2019   40
B         06  2018   56
B         01  2019   10
B         09  2019   20

REQUIRED OUTPUT

Person   month   Year Quantity
A         01  2018   0
A         02  2018   900
A         03  2018   0
A         04  2018   600
A         05  2018   0
A         06  2018   0
A         07  2018   0
A         08  2018   0
A         09  2018   300
A         10  2018   0
A         11  2018   0
A         12  2018   0
A         01  2019   0
A         01  2019   0
A         02  2019   0
A         03  2019   0
A         04  2019   40
A         05  2019   0
A         06  2019   0
A         07  2019   0
A         08  2019   0
A         09  2019   0
A         10  2019   0
A         11  2019   0
A         12  2019   0
B         01  2018   0
B         02  2018   0
B         03  2018   0
B         04  2018   0
B         05  2018   0
B         06  2018   56
B         07  2018   0
B         08  2018   0
B         09  2018   0
B         10  2018   0
B         11  2018   0
B         12  2018   0
B         01  2019   10
B         02  2019   0
B         03  2019   0
B         04  2019   0
B         05  2019   0
B         06  2019   0
B         07  2019   0
B         08  2019   0
B         09  2019   20
B         10  2019   0
B         11  2019   0
B         12  2019   0

Hello All,

I have the sample data above in a R data frame and I would like to add 0 "quantities" for each month where a data is missing for a "person". I've got missing months and year as you can see which I will like insert to my data frame for for each person. i.e. add the months-year values of 0 demand from a person, for the missing months. (Just to give some background , the initial data I had , had date for each demand but i grouped that in months-year as I extracted the month and year values from date field like below....

PersonMonthDemand <-
  DateData %>%
  mutate(month = month(DateOfDemand))%>%
  mutate(year= year(DateOfDemand))%>%
  group_by(Person, month, year) %>%
  summarise(Quantity = sum(Quantity)) 

which resulted in the ORIGINAL data as shown above)

Will appreciate if some experts here can help please. Thanks in advance. Kind Regards, Gayatri

Upvotes: 1

Views: 60

Answers (2)

Hemant Gupta
Hemant Gupta

Reputation: 26

Create a left-hand side base table.

To do this, take the unique values of all customers:

cstmr<-data.frame(unique(orgn_data$Person))
change column names
names(cstmr)[1]<-'Person'

Find all combination of date and year in a dataframe using:

orgn_data_year_mnth<-unique(orgn_data[c("month", "Year")])
cross join cstmr & organ_data_year_mnth using
cc<-merge(cstmr, organ_data_year_mnth)

Final step would be left join original data on cc:

final_df<-merge(cc,orgn_data,by=c('Person','month','Year'),all.x=TRUE)

And then replace all NAs with 0 using:

final_df[is.na(final_df)] <- 0

Upvotes: 1

akrun
akrun

Reputation: 886998

We can use complete from tidyr

library(tidyr)
library(dplyr)
df1 %>%
    complete(Person, month = 1:12, Year, fill = list(Quantity = 0)) %>% 
    arrange(Person, Year)
# A tibble: 48 x 4
#  Person month  Year Quantity
#   <chr>  <int> <int>    <dbl>
# 1 A          1  2018        0
# 2 A          2  2018      900
# 3 A          3  2018        0
# 4 A          4  2018      600
# 5 A          5  2018        0
# 6 A          6  2018        0
# 7 A          7  2018        0
# 8 A          8  2018        0
# 9 A          9  2018      300
#10 A         10  2018        0
# … with 38 more rows

###data

df1 <- structure(list(Person = c("A", "A", "A", "A", "B", "B", "B"), 
    month = c(2L, 4L, 9L, 4L, 6L, 1L, 9L), Year = c(2018L, 2018L, 
    2018L, 2019L, 2018L, 2019L, 2019L), Quantity = c(900L, 600L, 
    300L, 40L, 56L, 10L, 20L)), class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 1

Related Questions