Reputation: 141
I have two data sets:
# 1.
user_id users frequency
1 1 3
2 1 1
3 1 1
# 2.
user_id sum unique
1 2 1
2 0 0
3 1 1
I want to merge on user_id
but have index based on unit1 in an ordinal fashion so the output would look like, leaving user_id
out of the picture:
# 3.
frequency users sum unique
3 1 2 1
1 2 1 1
Any thoughts on how to achieve this? Also, in terms of learning how to do these types of manipulations, is their a name for this type of manipulation?
Upvotes: 0
Views: 84
Reputation: 887681
Here is an option using tidyverse
. We can do an inner_join
between the two datasets, grouped by 'frequency', we get the sum
of the variables within summarise_each
library(dplyr)
inner_join(df1, df2) %>%
group_by(frequency) %>%
summarise_each(funs(sum), sum, unique, users)
# frequency sum unique users
# <int> <int> <int> <int>
#1 1 1 1 2
#2 3 2 1 1
Or using base R
, we merge
the datasets and do an aggregate
aggregate(.~frequency, merge(df1, df2)[-1], FUN = sum)
# frequency users sum unique
#1 1 2 1 1
#2 3 1 2 1
Upvotes: 1
Reputation: 8413
library(data.table)
setDT(df) # this step was to make it a data.table, if its a data.frame
setDT(df1)
# logic is : first merge both df's, then group by "frequency" columns
df[df1][, lapply(.SD, sum), by = .(frequency), .SDcols = c("sum", "unique", "users")]
# frequency sum unique users
#1: 3 2 1 1
#2: 1 1 1 2
Upvotes: 2