Reputation: 167
I'm working with Bureau of Labor Statistics data .When I load the data from his webpage as a data frame. I get something like this:
year period periodName latest value footnotes seriesID date
<dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 2020 M07 July true 139582 "P preliminary" CES0000000001
2 2020 M06 June NA 137819 "P preliminary" CES0000000001
3 2020 M05 May NA 133028 "" CES0000000001
4 2020 M04 April NA 130303 "" CES0000000001
5 2020 M03 March NA 151090 "" CES0000000001
6 2020 M02 February NA 152463 "C corrected" CES0000000001
As you can see there are two different columns for date(year and period). So I tried to combine them into a single date column :
nfp$date <- as.yearmon(paste(nfp$year, nfp$period), "%Y %m")
This is my df now:
year period periodName latest value footnotes seriesID date
<dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <yearmon>
1 2020 M07 July true 139582 "P preliminary" CES0000000001 NA
2 2020 M06 June NA 137819 "P preliminary" CES0000000001 NA
3 2020 M05 May NA 133028 "" CES0000000001 NA
4 2020 M04 April NA 130303 "" CES0000000001 NA
5 2020 M03 March NA 151090 "" CES0000000001 NA
6 2020 M02 February NA 152463 "C corrected" CES0000000001 NA
Now I have NA's in date column. I thought that the problem was chr format columns, so I tried this:
nfp <- nfp %>%
mutate(month_year = as.Date(period, "%m"))
and I get a date column with NA's .This is my desire output :
year period periodName latest value footnotes seriesID **date**
<dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <yearmon>
1 2020 M07 July true 139582 "P preliminary" CES0000000001 2020-07
2 2020 M06 June NA 137819 "P preliminary" CES0000000001 2020-06
3 2020 M05 May NA 133028 "" CES0000000001 2020-05
4 2020 M04 April NA 130303 "" CES0000000001 2020-04
5 2020 M03 March NA 151090 "" CES0000000001 2020-03
6 2020 M02 February NA 152463 "C corrected" CES0000000001 2020-02
So in that way I can convert my df into a ts object.
Upvotes: 0
Views: 587
Reputation: 1364
Here is another solution for your question
library(tidyverse)
library(lubridate)
df2 <- df %>%
mutate(date2 = paste0(year, "-", periodName, "-01"),
date2 = as.Date(date2, format = "%Y-%B-%d"),
date2 = str_replace_all(date2, regex("(-01)"), ""))
# year period periodName latest value footnotes seriesID date2
# 1 2020 M07 July true 139582 P preliminary CES0000000001 2020-07
# 2 2020 M06 June <NA> 137819 P preliminary CES0000000001 2020-06
# 3 2020 M05 May <NA> 133028 CES0000000001 2020-05
# 4 2020 M04 April <NA> 130303 CES0000000001 2020-04
# 5 2020 M03 March <NA> 151090 CES0000000001 2020-03
# 6 2020 M02 February <NA> 152463 C corrected CES0000000001 2020-02
Upvotes: 1
Reputation: 174476
I think you just need to have the string in one of the default formats that can be read by as.yearmon
. From the docs, one of these is "%b %Y"
, so you can do:
library(dplyr)
library(zoo)
df %>% mutate(date = as.yearmon(paste(periodName, year)))
#> # A tibble: 6 x 8
#> year period periodName latest value footnotes seriesID date
#> <int> <chr> <chr> <chr> <int> <chr> <chr> <yearmon>
#> 1 2020 M07 July true 139582 "P preliminary" CES0000000001 Jul 2020
#> 2 2020 M06 June <NA> 137819 "P preliminary" CES0000000001 Jun 2020
#> 3 2020 M05 May <NA> 133028 "" CES0000000001 May 2020
#> 4 2020 M04 April <NA> 130303 "" CES0000000001 Apr 2020
#> 5 2020 M03 March <NA> 151090 "" CES0000000001 Mar 2020
#> 6 2020 M02 February <NA> 152463 "C corrected" CES0000000001 Feb 2020
Data
df <- structure(list(year = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L
), period = c("M07", "M06", "M05", "M04", "M03", "M02"), periodName = c("July",
"June", "May", "April", "March", "February"), latest = c("true",
NA, NA, NA, NA, NA), value = c(139582L, 137819L, 133028L, 130303L,
151090L, 152463L), footnotes = c("P preliminary", "P preliminary",
"", "", "", "C corrected"), seriesID = c("CES0000000001", "CES0000000001",
"CES0000000001", "CES0000000001", "CES0000000001", "CES0000000001"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
df
#> year period periodName latest value footnotes seriesID
#> 1 2020 M07 July true 139582 P preliminary CES0000000001
#> 2 2020 M06 June <NA> 137819 P preliminary CES0000000001
#> 3 2020 M05 May <NA> 133028 CES0000000001
#> 4 2020 M04 April <NA> 130303 CES0000000001
#> 5 2020 M03 March <NA> 151090 CES0000000001
#> 6 2020 M02 February <NA> 152463 C corrected CES0000000001
Created on 2020-08-07 by the reprex package (v0.3.0)
Upvotes: 1