Reputation: 40979

Count number of times that an item occurred in each of multiple columns

I would like to read a table and create another table that counts how many times a unique ID occurs in multiple specific columns.

For example, I have a table where each row shows a transaction, with a userId identifying the role of each person.

buyer      <- c("A", "A", "B", "A", "B", "C")
seller     <- c("C", "B", "C", "B", "C", "A")
negotiator <- c("B", "C", "D", "D", "A", "B")

df <- data.frame(buyer, seller, negotiator)
df
#    buyer seller negotiator
#  1     A      C          B
#  2     A      B          C
#  3     B      C          D
#  4     A      B          D
#  5     B      C          A
#  6     C      A          B

I would then like to create a table that counts how many times a userId fulfilled a role in the transactions.

#   id  asBuyer  asSeller  asNegotiator
#    A        3         1             1
#    B        2         2             2
#    C        1         3             1
#    D        0         0             2

Would I need to create different dataframes and then merge?

Upvotes: 3

Answers (4)

MichaelChirico

Reputation: 34763

I would use data.table:

update inspired by MrFlick

library(data.table)
setDT(df)
dcast(melt(df, measure.vars = names(df)), value ~ variable)
#    value buyer seller negotiator
# 1:     A     3      1          1
# 2:     B     2      2          2
# 3:     C     1      3          1
# 4:     D     0      0          2

You could add fun.aggregate = length as an argument to dcast to suppress the warning message. And you could add value.name = "id" as an argument to melt if you prefer that column to be named id.

original, longer answer

setDT(df)

outDT <- data.table(id = unique(unlist(df)))

invisible(
  sapply(names(df), function(jj)
    outDT[df[ , .N, by = jj], 
          #set the name you desire by pasting;
          #  could use a regex or substr to 
          #  for the first letter capital if need be
          (jj2 <- paste0("as", jj)) := i.N,
          #merge id to the count column
          on = c(id = jj)
          clean-up: missed observations were NA, set to 0
          ][is.na(get(jj2)), (jj2) := 0])
)

Upvotes: 3

stackoverflowuser2010

Reputation: 40979

There are too many R wizards here.

Here is my simple solution using basic R with just ddply (to create "count group by" tables) and merge (to perform outer join).

# Create data frame for buyer count
dfBuyer <- ddply(df, c("buyer"), summarise, count=length(seller))
colnames(dfBuyer) <- c("id", "asBuyer")

dfBuyer
#    id asBuyer
#  1  A       3
#  2  B       2
#  3  C       1


# Create data frame for seller count
dfSeller <- ddply(df, c("seller"), summarise, count=length(buyer))
colnames(dfSeller) <- c("id", "asSeller")

dfSeller
#    id asSeller
#  1  A        1
#  2  B        2
#  3  C        3


# Create data frame for negotiator count
dfNegotiator <- ddply(df, c("negotiator"), summarise, count=length(seller))
colnames(dfNegotiator) <- c("id", "asNegotiator")

dfNegotiator
#    id asNegotiator
#  1  A            1
#  2  B            2
#  3  C            1
#  4  D            2


# merge() apparently can merge only two dataframes at a time,
# so to merge three dataframes, merge the first two and then
# the third. Use "all=TRUE" to perform outer join.

# Merge buyer and seller
dfBuyerSellerMerged <- merge(x=dfBuyer, y=dfSeller, by="id", all=TRUE)

# Merge buyer and seller and negotiator
dfBuyerSellerNegotiatorMerged <- merge(x=dfBuyerSellerMerged, y=dfNegotiator, by="id", all=TRUE) 

dfBuyerSellerNegotiatorMerged
#   id asBuyer asSeller asNegotiator
# 1  A       3        1            1
# 2  B       2        2            2
# 3  C       1        3            1
# 4  D      NA       NA            2


# Remove NAs.
dfBuyerSellerNegotiatorMerged[is.na(dfBuyerSellerNegotiatorMerged)] <- 0

dfBuyerSellerNegotiatorMerged
#   id asBuyer asSeller asNegotiator
# 1  A       3        1            1
# 2  B       2        2            2
# 3  C       1        3            1
# 4  D       0        0            2

Upvotes: 0

digEmAll

Reputation: 57220

Here's a solution using only base R (probably slower than other approaches) :

lst <- lapply(names(df), function(col) as.data.frame(table(df[[col]]),responseName=col))

mergeAll <- function(x,y) merge(x,y,all=TRUE)

res <- Reduce(f=mergeAll, lst)
names(res)[1] <- 'id'
res[is.na(res)] <- 0

> res
  id buyer seller negotiator
1  A     3      1          1
2  B     2      2          2
3  C     1      3          1
4  D     0      0          2

Upvotes: 3

MrFlick

Reputation: 206546

You can first melt your data then tabulate it. For example

dd<-reshape2::melt(df,0)
xtabs(~value+variable,dd)
#      variable
# value buyer seller negotiator
#     A     3      1          1
#     B     2      2          2
#     C     1      3          1
#     D     0      0          2

Upvotes: 5

Count number of times that an item occurred in each of multiple columns

Answers (4)

update inspired by MrFlick

original, longer answer

Related Questions