Reputation: 40979
I would like to read a table and create another table that counts how many times a unique ID occurs in multiple specific columns.
For example, I have a table where each row shows a transaction, with a userId identifying the role of each person.
buyer <- c("A", "A", "B", "A", "B", "C")
seller <- c("C", "B", "C", "B", "C", "A")
negotiator <- c("B", "C", "D", "D", "A", "B")
df <- data.frame(buyer, seller, negotiator)
df
# buyer seller negotiator
# 1 A C B
# 2 A B C
# 3 B C D
# 4 A B D
# 5 B C A
# 6 C A B
I would then like to create a table that counts how many times a userId fulfilled a role in the transactions.
# id asBuyer asSeller asNegotiator
# A 3 1 1
# B 2 2 2
# C 1 3 1
# D 0 0 2
Would I need to create different dataframes and then merge?
Upvotes: 3
Views: 948
Reputation: 34763
I would use data.table
:
library(data.table)
setDT(df)
dcast(melt(df, measure.vars = names(df)), value ~ variable)
# value buyer seller negotiator
# 1: A 3 1 1
# 2: B 2 2 2
# 3: C 1 3 1
# 4: D 0 0 2
You could add fun.aggregate = length
as an argument to dcast
to suppress the warning message. And you could add value.name = "id"
as an argument to melt
if you prefer that column to be named id
.
setDT(df)
outDT <- data.table(id = unique(unlist(df)))
invisible(
sapply(names(df), function(jj)
outDT[df[ , .N, by = jj],
#set the name you desire by pasting;
# could use a regex or substr to
# for the first letter capital if need be
(jj2 <- paste0("as", jj)) := i.N,
#merge id to the count column
on = c(id = jj)
clean-up: missed observations were NA, set to 0
][is.na(get(jj2)), (jj2) := 0])
)
Upvotes: 3
Reputation: 40979
There are too many R wizards here.
Here is my simple solution using basic R with just ddply
(to create "count group by" tables) and merge
(to perform outer join).
# Create data frame for buyer count
dfBuyer <- ddply(df, c("buyer"), summarise, count=length(seller))
colnames(dfBuyer) <- c("id", "asBuyer")
dfBuyer
# id asBuyer
# 1 A 3
# 2 B 2
# 3 C 1
# Create data frame for seller count
dfSeller <- ddply(df, c("seller"), summarise, count=length(buyer))
colnames(dfSeller) <- c("id", "asSeller")
dfSeller
# id asSeller
# 1 A 1
# 2 B 2
# 3 C 3
# Create data frame for negotiator count
dfNegotiator <- ddply(df, c("negotiator"), summarise, count=length(seller))
colnames(dfNegotiator) <- c("id", "asNegotiator")
dfNegotiator
# id asNegotiator
# 1 A 1
# 2 B 2
# 3 C 1
# 4 D 2
# merge() apparently can merge only two dataframes at a time,
# so to merge three dataframes, merge the first two and then
# the third. Use "all=TRUE" to perform outer join.
# Merge buyer and seller
dfBuyerSellerMerged <- merge(x=dfBuyer, y=dfSeller, by="id", all=TRUE)
# Merge buyer and seller and negotiator
dfBuyerSellerNegotiatorMerged <- merge(x=dfBuyerSellerMerged, y=dfNegotiator, by="id", all=TRUE)
dfBuyerSellerNegotiatorMerged
# id asBuyer asSeller asNegotiator
# 1 A 3 1 1
# 2 B 2 2 2
# 3 C 1 3 1
# 4 D NA NA 2
# Remove NAs.
dfBuyerSellerNegotiatorMerged[is.na(dfBuyerSellerNegotiatorMerged)] <- 0
dfBuyerSellerNegotiatorMerged
# id asBuyer asSeller asNegotiator
# 1 A 3 1 1
# 2 B 2 2 2
# 3 C 1 3 1
# 4 D 0 0 2
Upvotes: 0
Reputation: 57220
Here's a solution using only base R (probably slower than other approaches) :
lst <- lapply(names(df), function(col) as.data.frame(table(df[[col]]),responseName=col))
mergeAll <- function(x,y) merge(x,y,all=TRUE)
res <- Reduce(f=mergeAll, lst)
names(res)[1] <- 'id'
res[is.na(res)] <- 0
> res
id buyer seller negotiator
1 A 3 1 1
2 B 2 2 2
3 C 1 3 1
4 D 0 0 2
Upvotes: 3
Reputation: 206546
You can first melt your data then tabulate it. For example
dd<-reshape2::melt(df,0)
xtabs(~value+variable,dd)
# variable
# value buyer seller negotiator
# A 3 1 1
# B 2 2 2
# C 1 3 1
# D 0 0 2
Upvotes: 5