Reputation: 117
In my dataset, I want to count the number of times each row appears in my dataset, which consists of five columns. I tried using table; however, this seems to only work with seeing how many times one column, not multiple, is duplicated since I get the error
attempt to make a table with >= 2^31 elements
As a quick example, say my dataframe is as follows:
dat <- data.frame(
SSN = c(204,401,204,666,401),
Name=c("Blossum","Buttercup","Blossum","MojoJojo","Buttercup"),
Age = c(7,8,7,43,8),
Gender = c(0,0,0,1,0)
)
How do I add another column with how many times each row appears in this dataframe?
Upvotes: 3
Views: 2071
Reputation: 2174
Using data.table
package. setDT
is used to inplace transform data.frame
into a data.table
.
Inplace (:=
) modification of dat
by adding count (.N
) of lines grouped by all columns of dat
(by=names(dat)
).
Note: inplace modification result is invisible. So you need to explicitly print it or add []
after (dat[, ...][]
).
setDT(dat)
dat[,by=names(dat),N:=.N][]
#> SSN Name Age Gender N
#> 1: 204 Blossum 7 0 2
#> 2: 401 Buttercup 8 0 2
#> 3: 204 Blossum 7 0 2
#> 4: 666 MojoJojo 43 1 1
#> 5: 401 Buttercup 8 0 2
or (to collapse lines)
setDT(dat)
dat[,by=names(dat),.N]
#> SSN Name Age Gender N
#> 1: 204 Blossum 7 0 2
#> 2: 401 Buttercup 8 0 2
#> 3: 666 MojoJojo 43 1 1
Upvotes: 2
Reputation: 101044
I am not sure which is your desired output. Below are some base R options
> aggregate(
+ cnt ~ .,
+ cbind(dat, cnt = 1),
+ sum
+ )
SSN Name Age Gender cnt
1 204 Blossum 7 0 2
2 401 Buttercup 8 0 2
3 666 MojoJojo 43 1 1
> transform(
+ cbind(dat, n = 1),
+ n = ave(n, SSN, Name, Age, Gender, FUN = sum)
+ )
SSN Name Age Gender n
1 204 Blossum 7 0 2
2 401 Buttercup 8 0 2
3 204 Blossum 7 0 2
4 666 MojoJojo 43 1 1
5 401 Buttercup 8 0 2
Upvotes: 0
Reputation: 886938
We can use add_count
without grouping as well
library(dplyr)
dat %>%
add_count(across(everything()))
-output
# SSN Name Age Gender n
#1 204 Blossum 7 0 2
#2 401 Buttercup 8 0 2
#3 204 Blossum 7 0 2
#4 666 MojoJojo 43 1 1
#5 401 Buttercup 8 0 2
Upvotes: 1
Reputation: 145755
With dplyr
, we could group by all columns:
dat %>%
group_by(across(everything())) %>%
mutate(n = n())
# # A tibble: 5 x 5
# # Groups: SSN, Name, Age, Gender [3]
# SSN Name Age Gender n
# <dbl> <chr> <dbl> <dbl> <int>
# 1 204 Blossum 7 0 2
# 2 401 Buttercup 8 0 2
# 3 204 Blossum 7 0 2
# 4 666 MojoJojo 43 1 1
# 5 401 Buttercup 8 0 2
(mutate(n = n())
is has a shortcut, add_tally()
, if you prefer. Use summarize(n = n()
or count()
if you want to collapse the data frame to the unique rows while adding counts)
Upvotes: 3