Using pull() from dplyr after reading data with haven::read_sas keeps attributes. How to avoid?

Question

I am working with several data sets that originally come as a .sas7bdat file.

Initially, I loaded all files using the sas7bdat package but I am now convinced that the haven package can do a better and quicker job.

However, newly loaded data with haven::read_(sas) seems to behave differently compared to sas7bdat::read.sas7bdat() when using pull() from dplyr:

library("haven")
library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library("sas7bdat")

data.sas7 <- sas7bdat::read.sas7bdat(system.file("examples", "iris.sas7bdat", package = "haven"))
data.sas7 %>% summarise(mean = mean(Petal_Length)) %>% pull
#> [1] 3.758

data.haven <- haven::read_sas(system.file("examples", "iris.sas7bdat", package = "haven"))
data.haven %>% summarise(mean = mean(Petal_Length)) %>% pull
#> [1] 3.758
#> attr(,"format.sas")
#> [1] "BEST"

^{Created on 2019-01-31 by the reprex package (v0.2.1)}

As can be seen from the example above the attr() are printed as well when data is loaded using haven. This in not practical when I want, for instance, print the outcome in an rmarkdown.

My question is: how can I avoid the attribute being printed when using pull() form dplyr when data is loaded with haven?

moodymudskipper · Accepted Answer

First let's reproduce similar data:

iris2 <- iris
attr(iris2$Petal.Length,"format.sas") <- "BEST"
iris2 %>% 
  summarise(mean = mean(Petal.Length)) %>% 
  pull
# [1] 3.758
# attr(,"format.sas")
# [1] "BEST"

Then see the first line I use here, it strips the attribute "format.sas" of all columns :

iris2 %>% 
  mutate_all(`attr<-`,"format.sas", NULL) %>% 
  summarise(mean = mean(Petal.Length)) %>% 
  pull
# [1] 3.758

If you want to remove all attributes:

iris2 %>% 
  mutate_all(`attributes<-`, NULL) %>% 
  summarise(mean = mean(Petal.Length)) %>% 
  pull

# [1] 3.758

Using pull() from dplyr after reading data with haven::read_sas keeps attributes. How to avoid?

Answers (1)

Related Questions