SuchARush
SuchARush

Reputation: 41

Correcting value for multiple years in a df in R

So I have a large dataset where a lot of the dates have the wrong year. For example in this case I would like to update the year 1970 -> 20015, 1971 -> 2016 etc for all the files except f4.csv where I want 1970 -> 2001.

What is the best way to do this? I guess I could do

            x          y
1  1970-06-15  f1.csv
2  1971-06-16  f1.csv
3  1972-06-17  f1.csv
4  1970-06-18  f2.csv
5  2011-06-15  f3.csv
6  2011-06-16  f3.csv
7  2011-06-17  f3.csv
8  2011-06-18  f3.csv
9  1970-02-10  f4.csv

I guess I could start with something like this and update the indexes but is there a more general way?

index <- which(year(df$x) == 1970 & y != "f4.csv")

Upvotes: 0

Views: 27

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

We can extract year from date using format. Using case_when we can check for conditions based on y and year values and assign new year values. Finally using str_replace we can then swap the new year values.

library(dplyr)

df %>%
  mutate(x = as.Date(x), 
         year = format(x, '%Y'), 
         year = case_when(y != "f4.csv" & year == 1970~"2015", 
                          y != "f4.csv" & year == 1971~"2016", 
                          y == "f4.csv" & year == 1970~"2001", 
                          TRUE ~ year),
         new_x = as.Date(stringr::str_replace(x, '....', year)))


#           x      y year      new_x
#1 1970-06-15 f1.csv 2015 2015-06-15
#2 1971-06-16 f1.csv 2016 2016-06-16
#3 1972-06-17 f1.csv 1972 1972-06-17
#4 1970-06-18 f2.csv 2015 2015-06-18
#5 2011-06-15 f3.csv 2011 2011-06-15
#6 2011-06-16 f3.csv 2011 2011-06-16
#7 2011-06-17 f3.csv 2011 2011-06-17
#8 2011-06-18 f3.csv 2011 2011-06-18
#9 1970-02-10 f4.csv 2001 2001-02-10

data

df <- structure(list(x = structure(c(2L, 4L, 5L, 3L, 6L, 7L, 8L, 9L, 
1L), .Label = c("1970-02-10", "1970-06-15", "1970-06-18", "1971-06-16", 
"1972-06-17", "2011-06-15", "2011-06-16", "2011-06-17", "2011-06-18"
), class = "factor"), y = structure(c(1L, 1L, 1L, 2L, 3L, 3L, 
3L, 3L, 4L), .Label = c("f1.csv", "f2.csv", "f3.csv", "f4.csv"
), class = "factor")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9"))

Upvotes: 1

Related Questions