Nandu
Nandu

Reputation: 3126

R-Find missing values from the dataframe and insert value in that position

Using R script I read values from database. The values contains the following data frame.

>values #return the output as follows

ID           Date      Hour      Value
1          2013-06-01    8           9
2          2013-06-01    9          17
3          2013-06-01   10          16
4          2013-06-01   11          21
5          2013-06-01   12          19
6          2013-06-01   13          15
7          2013-06-01   14          14
8          2013-06-01   15          14
9          2013-06-01   16          21
10         2013-06-01   17          22
11         2013-06-01   18          13
12         2013-06-01   19           2
13         2013-06-01   20           2
14         2013-06-01   21           1
15         2013-06-01   22           1
16         2013-06-01   23           1
17         2013-06-02    0           0
18         2013-06-02    1           0
19         2013-06-02    2           0
20         2013-06-02    3           1
21         2013-06-02    4           0
22         2013-06-02    5           0
23         2013-06-02    6           1
24         2013-06-02    7           1
25         2013-06-02    8          20
26         2013-06-02    9          21
27         2013-06-02   10          21
28         2013-06-02   11          15
29         2013-06-02   12          12
30         2013-06-02   13          11
31         2013-06-02   14          10
32         2013-06-02   15          16
33         2013-06-02   16          21
34         2013-06-02   17          22
35         2013-06-02   18          18
36         2013-06-02   19           9
37         2013-06-02   20           2
38         2013-06-02   21           0
39         2013-06-02   23           0

I want to find out the missing hours in the dataframe and add 0 to the Value in that missing hour of the date.
Example:
From the above dataframe values, the hour 22 is missing for the date 2013-06-02. I want to insert a row between the hour 21 and 23 as

ID           Date      Hour      Value
39         2013-06-02    22         0

How can i achieve this?

I tried as follows:

I have a hours list as

>hours<-c(0:23)
> hours #return as follows
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

>i<-values$Hour[1]+1
>count<-nrow(values)
>for(j in 1:count){
+h<-values$Hour[j]
+hr<-hours[i]
+if(h != hr)
+{
+#write code to insert row
+}
+i<-i+1
+if(i==25)
+{
+i<-c(1)
+}
+}

Please help me...

Upvotes: 2

Views: 2537

Answers (2)

Arun
Arun

Reputation: 118889

Here's a way using data.table:

require(data.table) 
# install the package and then load if you don't have it already
dt <- data.table(values, key="Hour")
out <- merge(dt[, .SD[J(Hour[1]:23), roll=-Inf], by=Date, 
      .SDcols = c("Hour", "ID")], dt[, list(Date, Hour, Value)], 
      by=c("Date", "Hour"), all=TRUE)[is.na(Value), Value := 0L]

Explanation: Your problem is a bit different/complicated because 1) you seem to want to fill in missing values only when they happen in the middle but not anywhere else (beginning or end) and 2) you want to fill the missing value of ID with the same value of the last non-missing ID (which can be accomplished with a roll), but you want to fill the Value corresponding to the missing ID with 0 (which can't be accomplished with a roll).

So, the idea then, is to first get just the missing value of ID by using roll=-Inf. This is accomplished by the statement:

dt[, .SD[J(Hour[1]:23), roll=-Inf], by=Date, .SDcols = c("Hour", "ID")]

This is after setting the key to Hour.

Now, we need to get a NA for Value for the newly added ID. So, we merge it back with dt (with ID removed, as we've already taken care of it). This is done by:

dt[, list(Date, Hour, Value)]

Once we've these, we merge on the columns Date, Hour and use the parameter all=TRUE to fill in missing values as well. Finally, we replace any NA for Value to 0.

Upvotes: 5

zx8754
zx8754

Reputation: 56249

For each date make rows 0:23, convert to dataframe, then merge with your data.

Your data:

values <- read.table(text="ID           Date      Hour      Value
1          2013-06-01    8           9
2          2013-06-01    9          17
3          2013-06-01   10          16
4          2013-06-01   11          21
5          2013-06-01   12          19
6          2013-06-01   13          15
7          2013-06-01   14          14
8          2013-06-01   15          14
9          2013-06-01   16          21
10         2013-06-01   17          22
11         2013-06-01   18          13
12         2013-06-01   19           2
13         2013-06-01   20           2
14         2013-06-01   21           1
15         2013-06-01   22           1
16         2013-06-01   23           1
17         2013-06-02    0           0
18         2013-06-02    1           0
19         2013-06-02    2           0
20         2013-06-02    3           1
21         2013-06-02    4           0
22         2013-06-02    5           0
23         2013-06-02    6           1
24         2013-06-02    7           1
25         2013-06-02    8          20
26         2013-06-02    9          21
27         2013-06-02   10          21
28         2013-06-02   11          15
29         2013-06-02   12          12
30         2013-06-02   13          11
31         2013-06-02   14          10
32         2013-06-02   15          16
33         2013-06-02   16          21
34         2013-06-02   17          22
35         2013-06-02   18          18
36         2013-06-02   19           9
37         2013-06-02   20           2
38         2013-06-02   21           0
39         2013-06-02   23           0", header = TRUE, as.is=T)

Here's the code:

#make dummy data frame with all dates and hours
dummy <- as.data.frame(
  cbind(
    sort(rep(unique(values$Date),24)),
    rep(0:23,length(unique(values$Date)))))
colnames(dummy) <- c("Date","Hour")
dummy$Date <- as.character(dummy$Date)
dummy$Hour <- as.numeric(as.character(dummy$Hour))

#merge with values dataframe
values_v1 <- merge(dummy,values,all.x=T)

#substitute NAs with 0(zero)
values_v1[is.na(values_v1)] <- 0

Upvotes: 4

Related Questions