Reputation: 41
So I have a large dataset where a lot of the dates have the wrong year. For example in this case I would like to update the year 1970 -> 20015, 1971 -> 2016 etc for all the files except f4.csv where I want 1970 -> 2001.
What is the best way to do this? I guess I could do
x y
1 1970-06-15 f1.csv
2 1971-06-16 f1.csv
3 1972-06-17 f1.csv
4 1970-06-18 f2.csv
5 2011-06-15 f3.csv
6 2011-06-16 f3.csv
7 2011-06-17 f3.csv
8 2011-06-18 f3.csv
9 1970-02-10 f4.csv
I guess I could start with something like this and update the indexes but is there a more general way?
index <- which(year(df$x) == 1970 & y != "f4.csv")
Upvotes: 0
Views: 27
Reputation: 388807
We can extract year from date using format
. Using case_when
we can check for conditions based on y
and year
values and assign new year values. Finally using str_replace
we can then swap the new year values.
library(dplyr)
df %>%
mutate(x = as.Date(x),
year = format(x, '%Y'),
year = case_when(y != "f4.csv" & year == 1970~"2015",
y != "f4.csv" & year == 1971~"2016",
y == "f4.csv" & year == 1970~"2001",
TRUE ~ year),
new_x = as.Date(stringr::str_replace(x, '....', year)))
# x y year new_x
#1 1970-06-15 f1.csv 2015 2015-06-15
#2 1971-06-16 f1.csv 2016 2016-06-16
#3 1972-06-17 f1.csv 1972 1972-06-17
#4 1970-06-18 f2.csv 2015 2015-06-18
#5 2011-06-15 f3.csv 2011 2011-06-15
#6 2011-06-16 f3.csv 2011 2011-06-16
#7 2011-06-17 f3.csv 2011 2011-06-17
#8 2011-06-18 f3.csv 2011 2011-06-18
#9 1970-02-10 f4.csv 2001 2001-02-10
data
df <- structure(list(x = structure(c(2L, 4L, 5L, 3L, 6L, 7L, 8L, 9L,
1L), .Label = c("1970-02-10", "1970-06-15", "1970-06-18", "1971-06-16",
"1972-06-17", "2011-06-15", "2011-06-16", "2011-06-17", "2011-06-18"
), class = "factor"), y = structure(c(1L, 1L, 1L, 2L, 3L, 3L,
3L, 3L, 4L), .Label = c("f1.csv", "f2.csv", "f3.csv", "f4.csv"
), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9"))
Upvotes: 1