Reputation: 1481
I am trying to replace NA with proceeding value in the same column using dplyr package, But I am not getting results. Below is script I am using.
data <- read.csv("data.csv",header=T,na.strings=c("","NULL"))
data$ID <- paste(data$ID1, data$ID2, sep='_')
data$End.Date <- as.Date(data$End.Date, "%d-%b-%y")
data1 <- data[order(data$ID, data$End.Date),]
library(tidyr)
library(dplyr)
data1 %>%
group_by(ID) %>%
fill(Start.date, .direction = 'up') %>%
fill(Start.date, .direction = 'down')
Also please see below data file which I am using in above script. Can anyone please help me to know why NA is not replaced in using above syntax. Script is not throwing error
ID1 ID2 Start date End Date
1031 40038 7-Nov-16 1-Jan-17
1031 40037 12-Sep-16 2-Oct-16
1031 40033 15-Feb-16 2-Oct-16
1031 40033 15-Feb-16 3-Jul-16
1031 40038 7-Nov-16 4-Dec-16
1031 40035 18-Jul-16 4-Sep-16
1031 40033 15-Feb-16 4-Sep-16
1031 40043 23-Jan-17 5-Feb-17
1031 40038 7-Nov-16 5-Feb-17
1031 40042 18-Jan-17 5-Feb-17
1031 40033 15-Feb-16 5-Jun-16
1031 40044 17-Feb-17 5-Mar-17
1031 40043 23-Jan-17 5-Mar-17
1031 40037 12-Sep-16 6-Nov-16
1031 40035 NULL 7-Aug-16
1031 40033 15-Feb-16 7-Aug-16
1031 40036 NULL 7-Aug-16
1031 40038 7-Nov-16 8-Jan-17
1031 40045 28-Mar-17 9-Apr-17
1031 40033 15-Feb-16 9-Oct-16
1031 40037 12-Sep-16 9-Oct-16
1031 40033 15-Feb-16 10-Jul-16
1031 40038 7-Nov-16 11-Dec-16
1031 40033 15-Feb-16 11-Sep-16
1031 40043 23-Jan-17 12-Feb-17
1031 40033 15-Feb-16 12-Jun-16
1031 40043 23-Jan-17 12-Mar-17
1031 40044 17-Feb-17 12-Mar-17
1031 40037 12-Sep-16 13-Nov-16
1031 40038 7-Nov-16 13-Nov-16
1031 40033 15-Feb-16 14-Aug-16
1031 40035 18-Jul-16 14-Aug-16
1031 40038 7-Nov-16 15-Jan-17
1031 40045 28-Mar-17 16-Apr-17
1031 40033 15-Feb-16 16-Oct-16
1031 40037 12-Sep-16 16-Oct-16
1031 40033 15-Feb-16 17-Jul-16
1031 40038 7-Nov-16 18-Dec-16
1031 40033 15-Feb-16 18-Sep-16
1031 40037 12-Sep-16 18-Sep-16
1031 40043 23-Jan-17 19-Feb-17
1031 40033 15-Feb-16 19-Jun-16
1031 40043 23-Jan-17 19-Mar-17
1031 40038 7-Nov-16 20-Nov-16
1031 40037 12-Sep-16 20-Nov-16
1031 40035 18-Jul-16 21-Aug-16
1031 40033 15-Feb-16 21-Aug-16
1031 40039 9-Jan-17 22-Jan-17
1031 40038 7-Nov-16 22-Jan-17
1031 40033 15-Feb-16 22-May-16
1031 40045 28-Mar-17 23-Apr-17
1031 40037 12-Sep-16 23-Oct-16
1031 40033 15-Feb-16 23-Oct-16
1031 40033 15-Feb-16 24-Jul-16
1031 40038 7-Nov-16 25-Dec-16
1031 40033 15-Feb-16 25-Sep-16
1031 40037 12-Sep-16 25-Sep-16
1031 40043 23-Jan-17 26-Feb-17
1031 40044 17-Feb-17 26-Feb-17
1031 40033 15-Feb-16 26-Jun-16
1031 40043 23-Jan-17 26-Mar-17
1031 40037 12-Sep-16 27-Nov-16
1031 40038 7-Nov-16 27-Nov-16
1031 40033 15-Feb-16 28-Aug-16
1031 40035 18-Jul-16 28-Aug-16
1031 40038 7-Nov-16 29-Jan-17
1031 40042 18-Jan-17 29-Jan-17
1031 40033 15-Feb-16 29-May-16
1031 40037 12-Sep-16 30-Oct-16
1031 40036 8-Jul-16 31-Jul-16
1031 40033 15-Feb-16 31-Jul-16
1031 40035 18-Jul-16 31-Jul-16
Upvotes: 1
Views: 203
Reputation: 3369
The zoo library has a great function for this called na.locf()
library(zoo)
library(dplyr)
data <- read.csv("data.csv",header=T,na.strings=c("","NULL"))
data$ID <- paste(data$ID1, data$ID2, sep='_')
data$End.Date <- as.Date(data$End.Date, "%d-%b-%y")
data1 <- data[order(data$ID, data$End.Date),]
data1 <- data1 %>%
group_by(ID) %>%
mutate(Start.Date = ifelse(is.na(Start.Date),na.locf(Start.Date, na.rm = FALSE), Start.Date))
Upvotes: 1