Jose Montoya
Jose Montoya

Reputation: 167

Problems with date format. How to combine separate month and year columns into a single date?

I'm working with Bureau of Labor Statistics data .When I load the data from his webpage as a data frame. I get something like this:

year period periodName latest  value footnotes       seriesID      date     
  <dbl> <chr>  <chr>      <chr>   <dbl> <chr>           <chr>         
1  2020 M07    July       true   139582 "P preliminary" CES0000000001        
2  2020 M06    June       NA     137819 "P preliminary" CES0000000001        
3  2020 M05    May        NA     133028 ""              CES0000000001        
4  2020 M04    April      NA     130303 ""              CES0000000001        
5  2020 M03    March      NA     151090 ""              CES0000000001        
6  2020 M02    February   NA     152463 "C corrected"   CES0000000001  

As you can see there are two different columns for date(year and period). So I tried to combine them into a single date column :

nfp$date <- as.yearmon(paste(nfp$year, nfp$period), "%Y %m")

This is my df now:

year period periodName latest  value footnotes       seriesID      date     
  <dbl> <chr>  <chr>      <chr>   <dbl> <chr>           <chr>         <yearmon>
1  2020 M07    July       true   139582 "P preliminary" CES0000000001 NA       
2  2020 M06    June       NA     137819 "P preliminary" CES0000000001 NA       
3  2020 M05    May        NA     133028 ""              CES0000000001 NA       
4  2020 M04    April      NA     130303 ""              CES0000000001 NA       
5  2020 M03    March      NA     151090 ""              CES0000000001 NA       
6  2020 M02    February   NA     152463 "C corrected"   CES0000000001 NA  

Now I have NA's in date column. I thought that the problem was chr format columns, so I tried this:

nfp <- nfp %>%
  mutate(month_year = as.Date(period, "%m"))

and I get a date column with NA's .This is my desire output :

 year period periodName latest  value footnotes       seriesID      **date**     
      <dbl> <chr>  <chr>      <chr>   <dbl> <chr>           <chr>         <yearmon>
    1  2020 M07    July       true   139582 "P preliminary" CES0000000001 2020-07     
    2  2020 M06    June       NA     137819 "P preliminary" CES0000000001 2020-06  
    3  2020 M05    May        NA     133028 ""              CES0000000001 2020-05      
    4  2020 M04    April      NA     130303 ""              CES0000000001 2020-04  
    5  2020 M03    March      NA     151090 ""              CES0000000001 2020-03  
    6  2020 M02    February   NA     152463 "C corrected"   CES0000000001 2020-02

So in that way I can convert my df into a ts object.

Upvotes: 0

Views: 587

Answers (2)

Mike V
Mike V

Reputation: 1364

Here is another solution for your question

library(tidyverse)
library(lubridate)
df2 <- df %>% 
  mutate(date2 = paste0(year, "-", periodName, "-01"),
         date2 = as.Date(date2, format = "%Y-%B-%d"),
         date2 = str_replace_all(date2, regex("(-01)"), ""))
#   year period periodName latest  value     footnotes      seriesID  date2
# 1 2020    M07       July   true 139582 P preliminary CES0000000001 2020-07
# 2 2020    M06       June   <NA> 137819 P preliminary CES0000000001 2020-06
# 3 2020    M05        May   <NA> 133028               CES0000000001 2020-05
# 4 2020    M04      April   <NA> 130303               CES0000000001 2020-04
# 5 2020    M03      March   <NA> 151090               CES0000000001 2020-03
# 6 2020    M02   February   <NA> 152463   C corrected CES0000000001 2020-02

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 174476

I think you just need to have the string in one of the default formats that can be read by as.yearmon. From the docs, one of these is "%b %Y", so you can do:

library(dplyr)
library(zoo)

df %>% mutate(date = as.yearmon(paste(periodName, year)))
#> # A tibble: 6 x 8
#>    year period periodName latest  value footnotes       seriesID      date     
#>   <int> <chr>  <chr>      <chr>   <int> <chr>           <chr>         <yearmon>
#> 1  2020 M07    July       true   139582 "P preliminary" CES0000000001 Jul 2020 
#> 2  2020 M06    June       <NA>   137819 "P preliminary" CES0000000001 Jun 2020 
#> 3  2020 M05    May        <NA>   133028 ""              CES0000000001 May 2020 
#> 4  2020 M04    April      <NA>   130303 ""              CES0000000001 Apr 2020 
#> 5  2020 M03    March      <NA>   151090 ""              CES0000000001 Mar 2020 
#> 6  2020 M02    February   <NA>   152463 "C corrected"   CES0000000001 Feb 2020

Data

df <- structure(list(year = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L
), period = c("M07", "M06", "M05", "M04", "M03", "M02"), periodName = c("July", 
"June", "May", "April", "March", "February"), latest = c("true", 
NA, NA, NA, NA, NA), value = c(139582L, 137819L, 133028L, 130303L, 
151090L, 152463L), footnotes = c("P preliminary", "P preliminary", 
"", "", "", "C corrected"), seriesID = c("CES0000000001", "CES0000000001", 
"CES0000000001", "CES0000000001", "CES0000000001", "CES0000000001"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

df
#>   year period periodName latest  value     footnotes      seriesID
#> 1 2020    M07       July   true 139582 P preliminary CES0000000001
#> 2 2020    M06       June   <NA> 137819 P preliminary CES0000000001
#> 3 2020    M05        May   <NA> 133028               CES0000000001
#> 4 2020    M04      April   <NA> 130303               CES0000000001
#> 5 2020    M03      March   <NA> 151090               CES0000000001
#> 6 2020    M02   February   <NA> 152463   C corrected CES0000000001

Created on 2020-08-07 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions