Reputation: 2451
Say I have a data.frame
object:
df <- data.frame(name=c('black','black','black','red','red'),
type=c('chair','chair','sofa','sofa','plate'),
num=c(4,5,12,4,3))
Now I want to count the number of rows (observations) of for each combination of name
and type
. This can be done like so:
table(df[ , c("name","type")])
or possibly also with plyr
, (though I am not sure how).
However, how do I get the results incorporated into the original data frame? So that the results will look like this:
df
# name type num count
# 1 black chair 4 2
# 2 black chair 5 2
# 3 black sofa 12 1
# 4 red sofa 4 1
# 5 red plate 3 1
where count
now stores the results from the aggregation.
A solution with plyr
could be interesting to learn as well, though I would like to see how this is done with base R.
Upvotes: 67
Views: 127951
Reputation: 51994
In collapse
, with fcount
. fcount
is noticeably faster than any other options.
library(collapse)
df |>
fcount(name, type, add = TRUE, name = "count")
# name type num count
# 1 black chair 4 2
# 2 black chair 5 2
# 3 black sofa 12 1
# 4 red sofa 4 1
# 5 red plate 3 1
Upvotes: 0
Reputation: 41275
Another option using add_tally from dplyr
. Here is a reproducible example:
df <- data.frame(name=c('black','black','black','red','red'),
type=c('chair','chair','sofa','sofa','plate'),
num=c(4,5,12,4,3))
library(dplyr)
df %>%
group_by(name, type) %>%
add_tally(name = "count")
#> # A tibble: 5 × 4
#> # Groups: name, type [4]
#> name type num count
#> <chr> <chr> <dbl> <int>
#> 1 black chair 4 2
#> 2 black chair 5 2
#> 3 black sofa 12 1
#> 4 red sofa 4 1
#> 5 red plate 3 1
Created on 2022-09-11 with reprex v2.0.2
Upvotes: 2
Reputation: 1068
One simple line in base R:
df$count = table(interaction(df[, (c("name", "type"))]))[interaction(df[, (c("name", "type"))])]
Same in two lines, for clarity/efficiency:
fact = interaction(df[, (c("name", "type"))])
df$count = table(fact)[fact]
Upvotes: 1
Reputation: 55695
Using data.table
:
library(data.table)
dt = as.data.table(df)
# or coerce to data.table by reference:
# setDT(df)
dt[ , count := .N, by = .(name, type)]
For pre-data.table 1.8.2
alternative, see edit history.
Using dplyr
:
library(dplyr)
df %>%
group_by(name, type) %>%
mutate(count = n())
Or simply:
add_count(df, name, type)
Using plyr
:
plyr::ddply(df, .(name, type), transform, count = length(num))
Upvotes: 87
Reputation: 56149
Using sqldf package:
library(sqldf)
sqldf("select a.*, b.cnt
from df a,
(select name, type, count(1) as cnt
from df
group by name, type) b
where a.name = b.name and
a.type = b.type")
# name type num cnt
# 1 black chair 4 2
# 2 black chair 5 2
# 3 black sofa 12 1
# 4 red sofa 4 1
# 5 red plate 3 1
Upvotes: 2
Reputation: 625
This should do your work :
df_agg <- aggregate(num~name+type,df,FUN=NROW)
names(df_agg)[3] <- "count"
df <- merge(df,df_agg,by=c('name','type'),all.x=TRUE)
Upvotes: 5
Reputation: 904
You were just one step away from incorporating the row count into the base dataset.
Using the tidy()
function from the broom
package, convert the frequency table into a data frame and inner join with df
:
df <- data.frame(name=c('black','black','black','red','red'),
type=c('chair','chair','sofa','sofa','plate'),
num=c(4,5,12,4,3))
library(broom)
df <- merge(df, tidy(table(df[ , c("name","type")])), by=c("name","type"))
df
name type num Freq
1 black chair 4 2
2 black chair 5 2
3 black sofa 12 1
4 red plate 3 1
5 red sofa 4 1
Upvotes: 1
Reputation: 38500
A two line alternative is to generate a variable of 0s and then fill it in with split<-
, split
, and lengths
like this:
# generate vector of 0s
df$count <-0L
# fill it in
split(df$count, df[c("name", "type")]) <- lengths(split(df$num, df[c("name", "type")]))
This returns the desired result
df
name type num count
1 black chair 4 2
2 black chair 5 2
3 black sofa 12 1
4 red sofa 4 1
5 red plate 3 1
Essentially, the RHS calculates the lengths of each name-type combination, returning a named vector of length 6 with 0s for "red.chair" and "black.plate." This is fed to the LHS with split <-
which takes the vector and appropriately adds the values in their given spots. This is essentially what ave
does, as you can see that the second to final line of ave
is
split(x, g) <- lapply(split(x, g), FUN)
However, lengths
is an optimized version of sapply(list, length)
.
Upvotes: 1
Reputation: 13103
The base R
function aggregate
will obtain the counts with a one-liner, but adding those counts back to the original data.frame
seems to take a bit of processing.
df <- data.frame(name=c('black','black','black','red','red'),
type=c('chair','chair','sofa','sofa','plate'),
num=c(4,5,12,4,3))
df
# name type num
# 1 black chair 4
# 2 black chair 5
# 3 black sofa 12
# 4 red sofa 4
# 5 red plate 3
rows.per.group <- aggregate(rep(1, length(paste0(df$name, df$type))),
by=list(df$name, df$type), sum)
rows.per.group
# Group.1 Group.2 x
# 1 black chair 2
# 2 red plate 1
# 3 black sofa 1
# 4 red sofa 1
my.summary <- do.call(data.frame, rows.per.group)
colnames(my.summary) <- c(colnames(df)[1:2], 'rows.per.group')
my.data <- merge(df, my.summary, by = c(colnames(df)[1:2]))
my.data
# name type num rows.per.group
# 1 black chair 4 2
# 2 black chair 5 2
# 3 black sofa 12 1
# 4 red plate 3 1
# 5 red sofa 4 1
Upvotes: 3
Reputation: 2451
Another way that generalizes more:
df$count <- unsplit(lapply(split(df, df[c("name","type")]), nrow), df[c("name","type")])
Upvotes: -2
Reputation: 173577
You can do this:
> ddply(df,.(name,type),transform,count = NROW(piece))
name type num count
1 black chair 4 2
2 black chair 5 2
3 black sofa 12 1
4 red plate 3 1
5 red sofa 4 1
or perhaps more intuitively,
> ddply(df,.(name,type),transform,count = length(num))
name type num count
1 black chair 4 2
2 black chair 5 2
3 black sofa 12 1
4 red plate 3 1
5 red sofa 4 1
Upvotes: 8
Reputation: 176648
You can use ave
:
df$count <- ave(df$num, df[,c("name","type")], FUN=length)
Upvotes: 31