Jens
Jens

Reputation: 291

How to get numbers of the rows with specific data?

I have got data.frame

          COL1         COL2 
1          X            X
2          X            X
3          X            Y
4          X            X
5          Z            X

If column has no X I need to get the number of appropriate row (3 and 5 in this case). I tried to use 'sapply' to check data through "!=", but I think I need something else.

Upvotes: 3

Views: 66

Answers (5)

Rui Barradas
Rui Barradas

Reputation: 76450

You were close, your idea of using sapply with primitive function != is basically correct.

The following returns a logical matrix.

sapply(dat, `!=`, "X")
#      COL1  COL2
#[1,] FALSE FALSE
#[2,] FALSE FALSE
#[3,] FALSE  TRUE
#[4,] FALSE FALSE
#[5,]  TRUE FALSE

If you need row numbers, wrap that in which, with argument arr.ind set to TRUE.

which(sapply(dat, `!=`, "X"), arr.ind = TRUE)
#     row col
#[1,]   5   1
#[2,]   3   2

Edit.

Several solutions were posted, here are comparison tests.
Since the OP says that there could be datasets with 100 columns, I have tested with two different datasets, the posted in the question and a larger one.

Function RuiJaap is Rui with the suggestion in Jaap's comment.

Rui <- function(DF, value = "X")
  which(sapply(DF, `!=`, value), arr.ind = TRUE)

DanY <- function(DF, value = "X")
  unique(sapply(DF, function(x) which(x != value)))

Jaap <- function(DF, value = "X")
  which(!!rowSums(DF != value))

RuiJaap <- function(DF, value = "X")
  which(dat != "X", arr.ind = TRUE)

library(ggplot2)
library(microbenchmark)

mb1 <- microbenchmark(Rui = Rui(dat),
                      RuiJaap = RuiJaap(dat),
                      Jaap = Jaap(dat),
                      DanY = DanY(dat),
                      times = 1e3)

mb2 <- microbenchmark(Rui = Rui(dat2),
                      RuiJaap = RuiJaap(dat2),
                      Jaap = Jaap(dat2),
                      DanY = DanY(dat2),
                      times = 1e2)

autoplot(mb1)
autoplot(mb2)

For small datasets DanY is faster but for larger ones, RuiJaap is the fastest.

Data.

dat <- read.table(text = "
          COL1         COL2 
1          X            X
2          X            X
3          X            Y
4          X            X
5          Z            X                  
", header = TRUE)


set.seed(1)
dat2 <- matrix("X", nrow = 20, ncol = 100)
dat2[sample(2000, 100)] <- "Y"
dat2 <- as.data.frame(dat2)

Upvotes: 1

Shree
Shree

Reputation: 11140

Here's a base R solution that will work for any number of columns -

which(rowSums(df == "X") != ncol(df))

Upvotes: 1

Ben G
Ben G

Reputation: 4338

Here's a tidyverse solution that can work on an arbitrary number of columns

library(tidyverse)

df <- tibble(
 col1 = c("X", "X", "X", "X", "Z"),
 col2 = c("X", "X", "Z", "X", "X"),
 col3 = c("X", "X", "Z", "Z", "X"),
)

filter_all(df, any_vars(. != "X"))

Which returns:

# A tibble: 3 x 3
 col1  col2  col3
  <chr> <chr> <chr>
  1     X     Z     Z
  2     X     X     Z
  3     Z     X     X

Upvotes: 1

Rahul Agarwal
Rahul Agarwal

Reputation: 4100

This will work:

library(dplyr)
filter(df, (COL1!="X" & COL2!="X"))

Upvotes: 1

DanY
DanY

Reputation: 6073

Here's a solution using which that will work on any number of columns:

# example data
df <- data.frame(
    col1 = c("X", "X", "X", "X", "Z"),
    col2 = c("X", "X", "Z", "X", "X"),
    stringsAsFactors = FALSE
)

# how to get rows without X
unique(sapply(df, function(x) which(x != "X")))

Upvotes: 2

Related Questions