Reputation: 41
So I have
df=data.frame(age=c(10,12,12,13,13,10), name=c('Maria','anders','anders','per','johanna','Maria'))
dups=df[duplicated(df),]
What R does when I run df %in% dups
Output: FALSE FALSE
I do realise for example if I run df$name %in% dups$name
Output: TRUE TRUE TRUE FALSE FALSE TRUE
which compares every name
of df
with the name
of dups
and checks if a name
is found at least once on dups
. I would assume df %in% dups
would check every row of df
against every row of dups
but that doesn't seem to be the case.
Upvotes: 2
Views: 51
Reputation: 81683
When %in%
is applied to data frames, the comparison takes place column-wise.
For example
df %in% df["age"]
# [1] TRUE FALSE
compares each column in df
with the column in the one-column data frame df["age"]
. Since the age
column is identical in both data frames, the first value is TRUE
.
For a row-wise comparison, you can use the following (complex) command:
sapply(seq(nrow(df)),
function(i1) any(as.logical(rowSums(sapply(seq(nrow(dups)),
function(i2) df[i1, ] == dups[i2, ])))))
# [1] TRUE TRUE TRUE FALSE FALSE TRUE
Upvotes: 4