Reputation: 1037
Here is some sample data
Dataset A
id name reasonforlogin
123 Tom work
246 Timmy work
789 Mark play
Dataset B
id name reasonforlogin
789 Mark work
313 Sasha interview
000 Meryl interview
987 Dara play
789 Mark play
246 Timmy work
Two datasets. Same columns. Uneven number of rows.
I want to be able to say something like
1)"I want all of id numbers that appear in both datasetA and datasetB"
or
2)"I want to know how many times any one ID logs in on a day, say day 2."
So the answer to
1) So a list like
[246, 789]
2) So a data.frame with a "header" of ids, and then a "row" of their login numhbers.
123, 246, 789, 313, 000, 987
0, 1, 2, 1, 1, 1
It seems easy, but I think its non-trivial to do this quickly with large data. Originally I planned on doing loops-in-loops, but I'm sure there has to be a term for these kind of comparisons and likely packages that already do similar things.
Upvotes: 0
Views: 46
Reputation: 4673
You need which
and table
.
1) Find which ids are in both data.frames
common_ids <- unique(df1[which(df1$id %in% df2$id), "id"])
Using intersect
as in the other answers is much more elegant in this simple case. which
provides however more flexibility when the comparison you need to do is more complicated than simple equality and is worth to know.
2) Find how many times any ID logs in
table(df1$id)
Upvotes: 0
Reputation: 99371
If we have A
as the first data set and B
the second, and id
as a character column in both so as to keep 000
from being printed as 0
, we can do ...
id
common to both data sets:
intersect(A$id, B$id)
# [1] "246" "789"
Times an id
logged in on the second day (B
), including those that were not logged in at all:
table(factor(B$id, levels = unique(c(A$id, B$id))))
# 123 246 789 313 000 987
# 0 1 2 1 1 1
Upvotes: 3
Reputation: 4024
You can do both with dplyr
A %>% select(id)
inner_join(B %>% select(id) ) %>%
distinct
B %>% count(id)
Upvotes: 0