nathaneastwood
nathaneastwood

Reputation: 3764

sparklyr date_format only works for certain formats

I am trying to use the Hive UDF date_format() to extract the day of the week but it only returns NA. Let's look at an example

sc <- sparklyr::spark_connect(master = "local")
df <- dplyr::copy_to(
  sc,
  data.frame(date = as.POSIXct("2020-01-01")),
  "df"
)
df
# # Source: spark<df> [?? x 1]
#   date
#   <dttm>
# 1 2019-12-31 23:00:00

# Extracting the year works fine...
dplyr::mutate_at(
  .tbl = df,
  .vars = "date",
  .funs = ~date_format(., "yyyy")
)
# # Source: spark<?> [?? x 1]
#   date
#   <chr>
# 1 2020

# But extracting the day of the week does not...
dplyr::mutate_at(
  .tbl = df,
  .vars = "date",
  .funs = ~date_format(., "E")
)
# # Source: spark<?> [?? x 1]
#   date
#   <chr>
# 1 NA

Any help would be appreciated. Some system information:

Upvotes: 1

Views: 263

Answers (1)

Emer
Emer

Reputation: 3824

My attempt was using mutate instead. If you want to change in place, replace DoW with date.

library(tidyverse)
library(sparklyr)

sc <- spark_connect(master = "local")

df <- dplyr::copy_to(sc, data.frame(date = as.POSIXct("2020-01-01")), "df")
df %>% mutate(DoW=date_format(date, "E"))
# Source: spark<?> [?? x 2]
  date                DoW  
  <dttm>              <chr>
1 2019-12-31 23:00:00 Wed  

Upvotes: 1

Related Questions