Zach
Zach

Reputation: 2455

Existing function for seeing if a row exists in a data frame?

Is there an existing function for determining whether a row exists within a data frame? I suppose could do an apply/identical, but it seems like I'm missing something.

For example:

given such a data frame:

  a   b
1 1 cat
2 2 dog

Is there an existing function which will allow me to test whether the row (1, cat) exists in the data frame?

Thanks, Zach

Upvotes: 17

Views: 26279

Answers (8)

Davide Viggiano
Davide Viggiano

Reputation: 1

This is my first post about programming... sorry if I am wrong! I have used this simple solution:

X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat")
paste(row_to_find, collapse = ' ') %in% paste(X[,1], X[,2])

Basically, %in% checks if a string is in a vector, so what you can do is to transform the table in a vector and the row in a single string, just using paste.

Upvotes: 0

David Rubinger
David Rubinger

Reputation: 3948

Another approach, using base R:

df <- data.frame(a = c(1, 2), b = c("cat", "dog"))
any(df$a == 1 & df$b == "cat")
#> [1] TRUE

Upvotes: 0

Rory Nolan
Rory Nolan

Reputation: 1042

For fans of dplyr and the tidyverse, you can use dplyr:anti_join(). According to its documentation, dplyr::anti_join(x, y) "returns all rows from x where there are not matching values in y, keeping just columns from x." Hence for dplyr::anti_join(row, df) the result has zero rows, then row was indeed in df, if it has one row, then row was not in df.

library(dplyr)

df <- tribble(~a, ~b,
              1,  "cat",
              2,  "dog")
#> # A tibble: 2 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  1.00 cat  
#> 2  2.00 dog

row <- tibble(a = 1, b = "cat")
#> # A tibble: 1 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  1.00 cat

nrow(anti_join(row, df)) == 0  # row is in df so should be TRUE
#> Joining, by = c("a", "b")
#> [1] TRUE

row <- tibble(a = 3, b = "horse")
#> # A tibble: 1 x 2
#>       a b    
#>   <dbl> <chr>
#> 1  3.00 horse

nrow(anti_join(row, df)) == 0  # row is not in df so should be FALSE
#> Joining, by = c("a", "b")
#> [1] FALSE

Upvotes: 1

Laurent Camus
Laurent Camus

Reputation: 11

I suggest Ben Bolker's solution since nrow(merge(row_to_find,X))>0 solution doesn't work for me (always give TRUE) :

tail(duplicated(rbind(X,row_to_find)),1)>0

Upvotes: 1

Wojciech Sobala
Wojciech Sobala

Reputation: 7561

For data from @Marek answer.

nrow(merge(row_to_find,X))>0 # TRUE if exists

Upvotes: 8

hadley
hadley

Reputation: 103948

Try match_df from plyr (using Marek's sample data):

library(plyr)
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat")

match_df(X, row_to_find)

Upvotes: 26

Marek
Marek

Reputation: 50753

Taking your example:

X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat") # it has to be data.frame (not a vector) to hold different types

Then

duplicated(rbind(X, row_to_find))[nrow(X)+1]

gives you answer.

Upvotes: 7

IRTFM
IRTFM

Reputation: 263471

For vector, y, with same number of elements as columns in dataframe, dfrm:

apply(dfrm, 1, function(x) all( x == y) )

Should return a vector of TRUE and FALSE which could in turn be used as an index in [,]

dfrm[ apply(dfrm, 1, function(x) all( x == y) ) , ]

The identical function is probably too stringent, since it will check attributes as well.

> y=c(1,2,3)
> x = data.frame(a=1:10, b=2:11, c=3:12)
> identical(x[1,] , y)
[1] FALSE

Upvotes: 1

Related Questions