Eric
Eric

Reputation: 2849

Replacing dates in column with a for loop

I am helping someone try to get to the solution they want without making too many changes to the code they came up with. I know that the for loop is not necessary. For example, you could solve it by adding datenumeric <- as.Date(datenumeric, "%Y%m%d") to their convertdatereadable function before passing it into lapply. I am having trouble replicating the same results using a for loop.

Request

dat has a date column with the following double values:

1947.01
1947.02
1947.03
1947.04
1947.05

The request is to convert the date column into date format format = "%Y%m%d".

Reproducible example

dat <- structure(list(date = c(1947.01000976562, 1947.02001953125, 1947.03002929688, 
1947.0400390625, 1947.05004882812), sp500 = c(15.210000038147, 
15.8000001907349, 15.1599998474121, 14.6000003814697, 14.3400001525879
), divyld = c(4.48999977111816, 4.38000011444092, 4.6100001335144, 
4.75, 5.05000019073486), i3 = c(0.379999995231628, 0.379999995231628, 
0.379999995231628, 0.379999995231628, 0.379999995231628), ip = c(22.3999996185303, 
22.5, 22.6000003814697, 22.5, 22.6000003814697), pcsp = c(NA, 
46.5483322143555, -48.6076202392578, -44.3271369934082, -21.3698806762695
), rsp500 = c(NA, 50.9283332824707, -43.9976196289062, -39.5771369934082, 
-16.319881439209), pcip = c(NA, 5.35716342926025, 5.33335399627686, 
-5.30975437164307, 5.33335399627686), ci3 = c(NA, 0, 0, 0, 0), 
    ci3_1 = c(NA, NA, 0, 0, 0), ci3_2 = c(NA, NA, NA, 0, 0), 
    pcip_1 = c(NA, NA, 5.35716342926025, 5.33335399627686, -5.30975437164307
    ), pcip_2 = c(NA, NA, NA, 5.35716342926025, 5.33335399627686
    ), pcip_3 = c(NA, NA, NA, NA, 5.35716342926025), pcsp_1 = c(NA, 
    NA, 46.5483322143555, -48.6076202392578, -44.3271369934082
    ), pcsp_2 = c(NA, NA, NA, 46.5483322143555, -48.6076202392578
    ), pcsp_3 = c(NA, NA, NA, NA, 46.5483322143555), month = c(-156, 
    -155, -154, -153, -152)), row.names = c(NA, 5L), class = "data.frame")

Code that includes their convertdatereadable function

convertdatereadable <- function(datenumeric){
    datenumeric <- trunc(datenumeric * 10000 + 1)
    datenumeric <- as.character(datenumeric)
    return(datenumeric)
}

dat[1] <- lapply(dat[1], convertdatereadable)


for (n in 1:nrow(dat)){
 dat$date <- as.Date(dat[n, 1], format = "%Y%m%d")
}

The for loop in its current state outputs the correct format but is, unfortunately, replicating the first date for all 5 rows.

Incorrect current output


dat[1]

#>         date
#> 1 1947-01-01
#> 2 1947-01-01
#> 3 1947-01-01
#> 4 1947-01-01
#> 5 1947-01-01

Desired output while keeping the for loop


dat[1]

#>         date
#> 1 1947-01-01
#> 2 1947-02-01
#> 3 1947-03-01
#> 4 1947-04-01
#> 5 1947-05-01

I thought this would work, but it doesn't:

for (n in 1:nrow(dat)){
 dat[n, 1] <- as.Date(dat[n, 1], format = "%Y%m%d")
}

Upvotes: 1

Views: 64

Answers (2)

thelatemail
thelatemail

Reputation: 93813

As others have said, using as.Date(..., format="%Y%m%d") is the way to do this rather than a loop.

But to understand what is going on here, break it down and check the status of the output after each line:

First, let's fix the loop to index both sides by n so that each value is overwritten in turn:

for (n in 1:nrow(dat)){
 dat$date[n] <- as.Date(dat$date[n], format = "%Y%m%d")
}

This results in a character representation of the number of days since 1970-01-01 (dates are stored in R as the numeric version of this):

dat$date
#[1] "-8401" "-8370" "-8342" "-8311" "-8281"
class(dat$date)
#[1] "character"

Why character and not numeric? Because you are using ]<- not <-, that is, you are not overwriting the whole dat$date column, but each dat$date[1], dat$date[2] etc. And that will keep the source class in this case since numeric data can always be coerced to a character, but character data can't be coerced to a number necessarily. E.g.:

x <- c("a","b","c")
x[1] <- 1
x
#[1] "1" "b" "c"
 
 
x <- c(1,2,3)
x[1] <- "a"
x
#[1] "a" "2" "3"

If you overwrite the whole object though, the class will change:

x <- c("a","b","c")
x <- c(1,2,3)
x
#[1] 1 2 3

You then need to force the class back to date:

class(dat$date) <- "Date"
dat$date
#[1] "1947-01-01" "1947-02-01" "1947-03-01" "1947-04-01" "1947-05-01"
class(dat$date)
#[1] "Date"

You could also get the same result by converting explicitly:

dat$date <- as.Date(as.numeric(dat$date), origin="1970-01-01")

Upvotes: 1

doctshind s
doctshind s

Reputation: 390

You are almost done. You need to just change the variable in the loop as below:

for (n in 1:nrow(dat)){
 dat$crcteddate <- as.Date(dat$date, format = "%Y%m%d")
}

This will create a column called 'crcteddate' and gives the following output:

"1947-01-01" "1947-02-01" "1947-03-01" "1947-04-01" "1947-05-01"

You have erroneously called the date column dat[n,1] instead of calling straight dat$date.

Upvotes: 1

Related Questions