stackinator
stackinator

Reputation: 5829

R - creating a column of character matrices

Here's my reproducible data frame:

library(tidyverse)
df <- structure(list(PN = c("41681", "16588", "34881", 
"36917", "33116", "68447"), `2017-10` = c(0L, 
0L, 0L, 0L, 0L, 0L), `2017-11` = c(0L, 1L, 0L, 0L, 0L, 0L), `2017-12` = c(0L, 
0L, 0L, 0L, 1L, 0L), `2018-01` = c(0L, 0L, 1L, 1L, 0L, 0L), `2018-02` = c(1L, 
0L, 0L, 0L, 0L, 0L), `2018-03` = c(0L, 0L, 0L, 0L, 0L, 0L), `2018-04` = c(0L, 
0L, 0L, 0L, 0L, 1L), Status = c("OK", "NOK", "OK", "NOK", "OK", 
"OK")), .Names = c("PN", "2017-10", "2017-11", "2017-12", 
"2018-01", "2018-02", "2018-03", "2018-04", "Status"), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Long story short... two of the steps to get me to the output above were:

1 early on in the analysis

mutate(n = parse_integer(str_replace_na(n, replacement = 0)))

2 later on in analysis

mutate(
  Status = 
    ifelse(
      (apply(.[, 2:7], 1, sum) > 0) & 
        (.[, 8] > 0), 
      "NOK", 
      "OK"
      )
)

Two kind stack warriors @joran and @akrun informed me that I "created a column of character matrices" and that's why I kept getting an "Error in arrange_impl(.data, dots) : Argument 1 is of unsupported type matrix" error.

In plain English what did I do? I'm the type of guy who doesn't yet understand the difference between an atomic vector and an atomic particle. Can you answer, with something clear and concise?

Or you can just tell me read chapter XYZ in R for Data Science or something like that. I'll take that too (maybe in the comments).

Upvotes: 2

Views: 101

Answers (1)

moodymudskipper
moodymudskipper

Reputation: 47350

To behave in the usually expected way, ifelse needs a vector of logical as a first argument.

What you feed to it here is (replacing the . with df):

(apply(df[, 2:7], 1, sum) > 0) & (df[, 8] > 0)
# which btw we can rewrite more clearly as:
# rowSums(df[2:7]) > 0 & df[,8] >0

#      2018-04
# [1,]   FALSE
# [2,]   FALSE
# [3,]   FALSE
# [4,]   FALSE
# [5,]   FALSE
# [6,]   FALSE

This wouldn't happen with a regular data.frame, as df[,8] would be converted to a vector.

Read ?Extract about the drop argument, tibbles behave a bit like data.frames do with drop = FALSE.

head(iris[,1])
# [1] 5.1 4.9 4.7 4.6 5.0 5.4

head(iris[,1,drop=FALSE])
#   Sepal.Length
# 1          5.1
# 2          4.9
# 3          4.7
# 4          4.6
# 5          5.0
# 6          5.4

head(as_tibble(iris)[,1])
# # A tibble: 6 x 1
#   Sepal.Length
# <dbl>
# 1          5.1
# 2          4.9
# 3          4.7
# 4          4.6
# 5          5.0
# 6          5.4

We don't need to get into how it translated to your wrong result, let's just manage to correct the input.

For this you can use df[[8]] instead of df[,8], it will always be a vector.

df %>% mutate(
  Status = 
    ifelse(
      rowSums(.[, 2:7]) > 0 & .[[8]] > 0, 
      "NOK", 
      "OK"
    )
) %>% str

# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of  9 variables:
# $ PN     : chr  "41681" "16588" "34881" "36917" ...
# $ 2017-10: int  0 0 0 0 0 0
# $ 2017-11: int  0 1 0 0 0 0
# $ 2017-12: int  0 0 0 0 1 0
# $ 2018-01: int  0 0 1 1 0 0
# $ 2018-02: int  1 0 0 0 0 0
# $ 2018-03: int  0 0 0 0 0 0
# $ 2018-04: int  0 0 0 0 0 1
# $ Status : chr  "OK" "OK" "OK" "OK" ...

Now the structure isn't problematic anymore.

Another way, that adds only one undercore character to your solution but wouldn't have taught us so much :), is to use if_else (from dplyr package) in place of ifelse. It does the magic conversion internally, that you did in the comments using as.vector.

Taking your original code and adding only the magical _:

df %>% mutate(
  Status = 
    if_else(
      (apply(.[, 2:7], 1, sum) > 0) & 
        (.[, 8] > 0), 
      "NOK", 
      "OK"
    )
) %>% str
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of  9 variables:
# $ PN     : chr  "41681" "16588" "34881" "36917" ...
# $ 2017-10: int  0 0 0 0 0 0
# $ 2017-11: int  0 1 0 0 0 0
# $ 2017-12: int  0 0 0 0 1 0
# $ 2018-01: int  0 0 1 1 0 0
# $ 2018-02: int  1 0 0 0 0 0
# $ 2018-03: int  0 0 0 0 0 0
# $ 2018-04: int  0 0 0 0 0 1
# $ Status : chr  "OK" "OK" "OK" "OK" ...

Explaination on the error

df %>% mutate(
  Status = 
    ifelse(
      (apply(.[, 2:7], 1, sum) > 0) & 
        (.[, 8] > 0), 
      "NOK", 
      "OK"
    )
) %>% str

# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of  9 variables:
# $ PN     : chr  "41681" "16588" "34881" "36917" ...
# $ 2017-10: int  0 0 0 0 0 0
# $ 2017-11: int  0 1 0 0 0 0
# $ 2017-12: int  0 0 0 0 1 0
# $ 2018-01: int  0 0 1 1 0 0
# $ 2018-02: int  1 0 0 0 0 0
# $ 2018-03: int  0 0 0 0 0 0
# $ 2018-04: int  0 0 0 0 0 1
# $ Status : chr [1:6, 1] "OK" "OK" "OK" "OK" ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr "2018-04"

Shows that Status is a character matrix of 6 rows and 1 column. arrange doesn't like that.

why did you get a character matrix ?

  • df[, 8] is a tibble
  • so df[, 8] > 0 is a matrix
  • so (apply(.[, 2:7], 1, sum) > 0) & (.[, 8] > 0) is a matrix

?ifelse says about the output value:

A vector of the same length and attributes (including dimensions and "class") as test

So Status will be a matrix and everything finally makes sense ;).

See also ?dplyr::if_else for more information.

Upvotes: 3

Related Questions