user3734568
user3734568

Reputation: 1481

Replacing NA with another value in same column in R

I am trying to replace NA with proceeding value in the same column using dplyr package, But I am not getting results. Below is script I am using.

data <- read.csv("data.csv",header=T,na.strings=c("","NULL"))

data$ID <- paste(data$ID1, data$ID2, sep='_')

data$End.Date <- as.Date(data$End.Date, "%d-%b-%y")

data1 <- data[order(data$ID, data$End.Date),]

library(tidyr)
library(dplyr)

data1 %>% 
  group_by(ID) %>% 
  fill(Start.date, .direction = 'up') %>% 
  fill(Start.date, .direction = 'down')

Also please see below data file which I am using in above script. Can anyone please help me to know why NA is not replaced in using above syntax. Script is not throwing error

ID1 ID2 Start date  End Date
1031    40038   7-Nov-16    1-Jan-17
1031    40037   12-Sep-16   2-Oct-16
1031    40033   15-Feb-16   2-Oct-16
1031    40033   15-Feb-16   3-Jul-16
1031    40038   7-Nov-16    4-Dec-16
1031    40035   18-Jul-16   4-Sep-16
1031    40033   15-Feb-16   4-Sep-16
1031    40043   23-Jan-17   5-Feb-17
1031    40038   7-Nov-16    5-Feb-17
1031    40042   18-Jan-17   5-Feb-17
1031    40033   15-Feb-16   5-Jun-16
1031    40044   17-Feb-17   5-Mar-17
1031    40043   23-Jan-17   5-Mar-17
1031    40037   12-Sep-16   6-Nov-16
1031    40035   NULL    7-Aug-16
1031    40033   15-Feb-16   7-Aug-16
1031    40036   NULL    7-Aug-16
1031    40038   7-Nov-16    8-Jan-17
1031    40045   28-Mar-17   9-Apr-17
1031    40033   15-Feb-16   9-Oct-16
1031    40037   12-Sep-16   9-Oct-16
1031    40033   15-Feb-16   10-Jul-16
1031    40038   7-Nov-16    11-Dec-16
1031    40033   15-Feb-16   11-Sep-16
1031    40043   23-Jan-17   12-Feb-17
1031    40033   15-Feb-16   12-Jun-16
1031    40043   23-Jan-17   12-Mar-17
1031    40044   17-Feb-17   12-Mar-17
1031    40037   12-Sep-16   13-Nov-16
1031    40038   7-Nov-16    13-Nov-16
1031    40033   15-Feb-16   14-Aug-16
1031    40035   18-Jul-16   14-Aug-16
1031    40038   7-Nov-16    15-Jan-17
1031    40045   28-Mar-17   16-Apr-17
1031    40033   15-Feb-16   16-Oct-16
1031    40037   12-Sep-16   16-Oct-16
1031    40033   15-Feb-16   17-Jul-16
1031    40038   7-Nov-16    18-Dec-16
1031    40033   15-Feb-16   18-Sep-16
1031    40037   12-Sep-16   18-Sep-16
1031    40043   23-Jan-17   19-Feb-17
1031    40033   15-Feb-16   19-Jun-16
1031    40043   23-Jan-17   19-Mar-17
1031    40038   7-Nov-16    20-Nov-16
1031    40037   12-Sep-16   20-Nov-16
1031    40035   18-Jul-16   21-Aug-16
1031    40033   15-Feb-16   21-Aug-16
1031    40039   9-Jan-17    22-Jan-17
1031    40038   7-Nov-16    22-Jan-17
1031    40033   15-Feb-16   22-May-16
1031    40045   28-Mar-17   23-Apr-17
1031    40037   12-Sep-16   23-Oct-16
1031    40033   15-Feb-16   23-Oct-16
1031    40033   15-Feb-16   24-Jul-16
1031    40038   7-Nov-16    25-Dec-16
1031    40033   15-Feb-16   25-Sep-16
1031    40037   12-Sep-16   25-Sep-16
1031    40043   23-Jan-17   26-Feb-17
1031    40044   17-Feb-17   26-Feb-17
1031    40033   15-Feb-16   26-Jun-16
1031    40043   23-Jan-17   26-Mar-17
1031    40037   12-Sep-16   27-Nov-16
1031    40038   7-Nov-16    27-Nov-16
1031    40033   15-Feb-16   28-Aug-16
1031    40035   18-Jul-16   28-Aug-16
1031    40038   7-Nov-16    29-Jan-17
1031    40042   18-Jan-17   29-Jan-17
1031    40033   15-Feb-16   29-May-16
1031    40037   12-Sep-16   30-Oct-16
1031    40036   8-Jul-16    31-Jul-16
1031    40033   15-Feb-16   31-Jul-16
1031    40035   18-Jul-16   31-Jul-16

Upvotes: 1

Views: 203

Answers (1)

Matt Jewett
Matt Jewett

Reputation: 3369

The zoo library has a great function for this called na.locf()

library(zoo)
library(dplyr)

data <- read.csv("data.csv",header=T,na.strings=c("","NULL"))

data$ID <- paste(data$ID1, data$ID2, sep='_')

data$End.Date <- as.Date(data$End.Date, "%d-%b-%y")

data1 <- data[order(data$ID, data$End.Date),]

data1 <- data1 %>% 
         group_by(ID) %>% 
         mutate(Start.Date = ifelse(is.na(Start.Date),na.locf(Start.Date, na.rm = FALSE), Start.Date))

Upvotes: 1

Related Questions