Reputation: 5829
Here's my reproducible data frame:
library(tidyverse)
df <- structure(list(PN = c("41681", "16588", "34881",
"36917", "33116", "68447"), `2017-10` = c(0L,
0L, 0L, 0L, 0L, 0L), `2017-11` = c(0L, 1L, 0L, 0L, 0L, 0L), `2017-12` = c(0L,
0L, 0L, 0L, 1L, 0L), `2018-01` = c(0L, 0L, 1L, 1L, 0L, 0L), `2018-02` = c(1L,
0L, 0L, 0L, 0L, 0L), `2018-03` = c(0L, 0L, 0L, 0L, 0L, 0L), `2018-04` = c(0L,
0L, 0L, 0L, 0L, 1L), Status = c("OK", "NOK", "OK", "NOK", "OK",
"OK")), .Names = c("PN", "2017-10", "2017-11", "2017-12",
"2018-01", "2018-02", "2018-03", "2018-04", "Status"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Long story short... two of the steps to get me to the output above were:
1
early on in the analysis
mutate(n = parse_integer(str_replace_na(n, replacement = 0)))
2
later on in analysis
mutate(
Status =
ifelse(
(apply(.[, 2:7], 1, sum) > 0) &
(.[, 8] > 0),
"NOK",
"OK"
)
)
Two kind stack warriors @joran and @akrun informed me that I "created a column of character matrices" and that's why I kept getting an "Error in arrange_impl(.data, dots) : Argument 1 is of unsupported type matrix" error.
In plain English what did I do? I'm the type of guy who doesn't yet understand the difference between an atomic vector and an atomic particle. Can you answer, with something clear and concise?
Or you can just tell me read chapter XYZ in R for Data Science or something like that. I'll take that too (maybe in the comments).
Upvotes: 2
Views: 101
Reputation: 47350
To behave in the usually expected way, ifelse
needs a vector of logical
as a first argument.
What you feed to it here is (replacing the .
with df
):
(apply(df[, 2:7], 1, sum) > 0) & (df[, 8] > 0)
# which btw we can rewrite more clearly as:
# rowSums(df[2:7]) > 0 & df[,8] >0
# 2018-04
# [1,] FALSE
# [2,] FALSE
# [3,] FALSE
# [4,] FALSE
# [5,] FALSE
# [6,] FALSE
This wouldn't happen with a regular data.frame
, as df[,8]
would be converted to a vector.
Read ?Extract
about the drop
argument, tibbles
behave a bit like data.frames
do with drop = FALSE
.
head(iris[,1])
# [1] 5.1 4.9 4.7 4.6 5.0 5.4
head(iris[,1,drop=FALSE])
# Sepal.Length
# 1 5.1
# 2 4.9
# 3 4.7
# 4 4.6
# 5 5.0
# 6 5.4
head(as_tibble(iris)[,1])
# # A tibble: 6 x 1
# Sepal.Length
# <dbl>
# 1 5.1
# 2 4.9
# 3 4.7
# 4 4.6
# 5 5.0
# 6 5.4
We don't need to get into how it translated to your wrong result, let's just manage to correct the input.
For this you can use df[[8]]
instead of df[,8]
, it will always be a vector.
df %>% mutate(
Status =
ifelse(
rowSums(.[, 2:7]) > 0 & .[[8]] > 0,
"NOK",
"OK"
)
) %>% str
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of 9 variables:
# $ PN : chr "41681" "16588" "34881" "36917" ...
# $ 2017-10: int 0 0 0 0 0 0
# $ 2017-11: int 0 1 0 0 0 0
# $ 2017-12: int 0 0 0 0 1 0
# $ 2018-01: int 0 0 1 1 0 0
# $ 2018-02: int 1 0 0 0 0 0
# $ 2018-03: int 0 0 0 0 0 0
# $ 2018-04: int 0 0 0 0 0 1
# $ Status : chr "OK" "OK" "OK" "OK" ...
Now the structure isn't problematic anymore.
Another way, that adds only one undercore character to your solution but wouldn't have taught us so much :), is to use if_else
(from dplyr
package) in place of ifelse
. It does the magic conversion internally, that you did in the comments using as.vector
.
Taking your original code and adding only the magical _
:
df %>% mutate(
Status =
if_else(
(apply(.[, 2:7], 1, sum) > 0) &
(.[, 8] > 0),
"NOK",
"OK"
)
) %>% str
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of 9 variables:
# $ PN : chr "41681" "16588" "34881" "36917" ...
# $ 2017-10: int 0 0 0 0 0 0
# $ 2017-11: int 0 1 0 0 0 0
# $ 2017-12: int 0 0 0 0 1 0
# $ 2018-01: int 0 0 1 1 0 0
# $ 2018-02: int 1 0 0 0 0 0
# $ 2018-03: int 0 0 0 0 0 0
# $ 2018-04: int 0 0 0 0 0 1
# $ Status : chr "OK" "OK" "OK" "OK" ...
Explaination on the error
df %>% mutate(
Status =
ifelse(
(apply(.[, 2:7], 1, sum) > 0) &
(.[, 8] > 0),
"NOK",
"OK"
)
) %>% str
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of 9 variables:
# $ PN : chr "41681" "16588" "34881" "36917" ...
# $ 2017-10: int 0 0 0 0 0 0
# $ 2017-11: int 0 1 0 0 0 0
# $ 2017-12: int 0 0 0 0 1 0
# $ 2018-01: int 0 0 1 1 0 0
# $ 2018-02: int 1 0 0 0 0 0
# $ 2018-03: int 0 0 0 0 0 0
# $ 2018-04: int 0 0 0 0 0 1
# $ Status : chr [1:6, 1] "OK" "OK" "OK" "OK" ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr "2018-04"
Shows that Status
is a character matrix of 6 rows and 1 column. arrange
doesn't like that.
why did you get a character matrix ?
df[, 8]
is a tibbledf[, 8] > 0
is a matrix(apply(.[, 2:7], 1, sum) > 0) & (.[, 8] > 0)
is a matrix?ifelse
says about the output value:
A vector of the same length and attributes (including dimensions and "class") as test
So Status
will be a matrix and everything finally makes sense ;).
See also ?dplyr::if_else
for more information.
Upvotes: 3