d8aninja
d8aninja

Reputation: 3643

dplyr coerces characters to factors

I'm sure there's a great reason for this that I am not finding at the moment, but ... why does dplyr coerce characters to factors, even when you explicitly coerce to character?

> letters
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> typeof(letters)
[1] "character"
> data.frame(
+   colA = as.character(letters), 
+   colB = as.character(LETTERS)
+ ) %>%
+   glimpse
Observations: 26
Variables: 2
$ colA <fct> a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z
$ colB <fct> A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z

Upvotes: 2

Views: 209

Answers (1)

akrun
akrun

Reputation: 887571

It is not the dplyr that coerce it to factor, but is the data.frame (base R constructor), where the default option is stringsAsFactors = TRUE. Specifying stringsAsFactors = FALSE will rectify the issue

data.frame(
  colA = letters, 
  colB = LETTERS, stringsAsFactors = FALSE
)

NOTE: There is no need to wrap as.character


As we are using tidyverse, an option is tibble, which will have the default setting of stringsAsFactors = FALSE

tibble(colA = letters, colB = LETTERS)

Upvotes: 5

Related Questions