Reputation: 291
I have got data.frame
COL1 COL2
1 X X
2 X X
3 X Y
4 X X
5 Z X
If column has no X I need to get the number of appropriate row (3 and 5 in this case). I tried to use 'sapply' to check data through "!=", but I think I need something else.
Upvotes: 3
Views: 66
Reputation: 76450
You were close, your idea of using sapply
with primitive function !=
is basically correct.
The following returns a logical matrix.
sapply(dat, `!=`, "X")
# COL1 COL2
#[1,] FALSE FALSE
#[2,] FALSE FALSE
#[3,] FALSE TRUE
#[4,] FALSE FALSE
#[5,] TRUE FALSE
If you need row numbers, wrap that in which
, with argument arr.ind
set to TRUE
.
which(sapply(dat, `!=`, "X"), arr.ind = TRUE)
# row col
#[1,] 5 1
#[2,] 3 2
Edit.
Several solutions were posted, here are comparison tests.
Since the OP says that there could be datasets with 100 columns, I have tested with two different datasets, the posted in the question and a larger one.
Function RuiJaap
is Rui
with the suggestion in Jaap's comment.
Rui <- function(DF, value = "X")
which(sapply(DF, `!=`, value), arr.ind = TRUE)
DanY <- function(DF, value = "X")
unique(sapply(DF, function(x) which(x != value)))
Jaap <- function(DF, value = "X")
which(!!rowSums(DF != value))
RuiJaap <- function(DF, value = "X")
which(dat != "X", arr.ind = TRUE)
library(ggplot2)
library(microbenchmark)
mb1 <- microbenchmark(Rui = Rui(dat),
RuiJaap = RuiJaap(dat),
Jaap = Jaap(dat),
DanY = DanY(dat),
times = 1e3)
mb2 <- microbenchmark(Rui = Rui(dat2),
RuiJaap = RuiJaap(dat2),
Jaap = Jaap(dat2),
DanY = DanY(dat2),
times = 1e2)
autoplot(mb1)
autoplot(mb2)
For small datasets DanY
is faster but for larger ones, RuiJaap
is the fastest.
Data.
dat <- read.table(text = "
COL1 COL2
1 X X
2 X X
3 X Y
4 X X
5 Z X
", header = TRUE)
set.seed(1)
dat2 <- matrix("X", nrow = 20, ncol = 100)
dat2[sample(2000, 100)] <- "Y"
dat2 <- as.data.frame(dat2)
Upvotes: 1
Reputation: 11140
Here's a base R solution that will work for any number of columns -
which(rowSums(df == "X") != ncol(df))
Upvotes: 1
Reputation: 4338
Here's a tidyverse
solution that can work on an arbitrary number of columns
library(tidyverse)
df <- tibble(
col1 = c("X", "X", "X", "X", "Z"),
col2 = c("X", "X", "Z", "X", "X"),
col3 = c("X", "X", "Z", "Z", "X"),
)
filter_all(df, any_vars(. != "X"))
Which returns:
# A tibble: 3 x 3
col1 col2 col3
<chr> <chr> <chr>
1 X Z Z
2 X X Z
3 Z X X
Upvotes: 1
Reputation: 4100
This will work:
library(dplyr)
filter(df, (COL1!="X" & COL2!="X"))
Upvotes: 1
Reputation: 6073
Here's a solution using which
that will work on any number of columns:
# example data
df <- data.frame(
col1 = c("X", "X", "X", "X", "Z"),
col2 = c("X", "X", "Z", "X", "X"),
stringsAsFactors = FALSE
)
# how to get rows without X
unique(sapply(df, function(x) which(x != "X")))
Upvotes: 2