Reputation: 2455
Is there an existing function for determining whether a row exists within a data frame? I suppose could do an apply/identical, but it seems like I'm missing something.
For example:
given such a data frame:
a b
1 1 cat
2 2 dog
Is there an existing function which will allow me to test whether the row (1, cat)
exists in the data frame?
Thanks, Zach
Upvotes: 17
Views: 26279
Reputation: 1
This is my first post about programming... sorry if I am wrong! I have used this simple solution:
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat")
paste(row_to_find, collapse = ' ') %in% paste(X[,1], X[,2])
Basically, %in%
checks if a string is in a vector, so what you can do is to transform the table in a vector and the row in a single string, just using paste
.
Upvotes: 0
Reputation: 3948
Another approach, using base R:
df <- data.frame(a = c(1, 2), b = c("cat", "dog"))
any(df$a == 1 & df$b == "cat")
#> [1] TRUE
Upvotes: 0
Reputation: 1042
For fans of dplyr
and the tidyverse
, you can use dplyr:anti_join()
. According to its documentation, dplyr::anti_join(x, y)
"returns all rows from x
where there are not matching values in y
, keeping just columns from x
." Hence for dplyr::anti_join(row, df)
the result has zero rows, then row
was indeed in df
, if it has one row, then row
was not in df
.
library(dplyr)
df <- tribble(~a, ~b,
1, "cat",
2, "dog")
#> # A tibble: 2 x 2
#> a b
#> <dbl> <chr>
#> 1 1.00 cat
#> 2 2.00 dog
row <- tibble(a = 1, b = "cat")
#> # A tibble: 1 x 2
#> a b
#> <dbl> <chr>
#> 1 1.00 cat
nrow(anti_join(row, df)) == 0 # row is in df so should be TRUE
#> Joining, by = c("a", "b")
#> [1] TRUE
row <- tibble(a = 3, b = "horse")
#> # A tibble: 1 x 2
#> a b
#> <dbl> <chr>
#> 1 3.00 horse
nrow(anti_join(row, df)) == 0 # row is not in df so should be FALSE
#> Joining, by = c("a", "b")
#> [1] FALSE
Upvotes: 1
Reputation: 11
I suggest Ben Bolker's solution since nrow(merge(row_to_find,X))>0
solution doesn't work for me (always give TRUE) :
tail(duplicated(rbind(X,row_to_find)),1)>0
Upvotes: 1
Reputation: 7561
For data from @Marek answer.
nrow(merge(row_to_find,X))>0 # TRUE if exists
Upvotes: 8
Reputation: 103948
Try match_df
from plyr (using Marek's sample data):
library(plyr)
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat")
match_df(X, row_to_find)
Upvotes: 26
Reputation: 50753
Taking your example:
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat") # it has to be data.frame (not a vector) to hold different types
Then
duplicated(rbind(X, row_to_find))[nrow(X)+1]
gives you answer.
Upvotes: 7
Reputation: 263471
For vector, y, with same number of elements as columns in dataframe, dfrm:
apply(dfrm, 1, function(x) all( x == y) )
Should return a vector of TRUE and FALSE which could in turn be used as an index in [,]
dfrm[ apply(dfrm, 1, function(x) all( x == y) ) , ]
The identical
function is probably too stringent, since it will check attributes as well.
> y=c(1,2,3)
> x = data.frame(a=1:10, b=2:11, c=3:12)
> identical(x[1,] , y)
[1] FALSE
Upvotes: 1