Joep_S
Joep_S

Reputation: 537

r count values in rows after dcast

I want to sum all values in a row of a dataframe after performing a dcast operation from the reshape2 package. Problem is that all values are the same (10) and are the sum of all rows combined. Values should be 4,2,4 Example data with code:

df <- data.frame(x = as.factor(c("A","A","A","A","B","B","C","C","C","C")),
                 y = as.factor(c("AA","AB","AA","AC","BB","BA","CC","CC","CC","CD")),
                 z = c("var1","var1","var2","var1","var2","var1","var1","var2","var2","var1"))

df2 <- df %>%
  group_by(x,y) %>%
  summarise(num = n()) %>%
  ungroup()

df3 <- dcast(df2,x~y, fill = 0 )

df3$total <- sum(df3$AA,df3$AB,df3$AC,df3$BA,df3$BB,df3$CC,df3$CD)

Upvotes: 0

Views: 114

Answers (2)

akrun
akrun

Reputation: 887153

We can specify the values_fn in pivot_wider and also use adorn_totals from janitor

library(dplyr)
library(tidyr)
library(janitor)
df %>% 
   pivot_wider(names_from = y, values_from = z, values_fill = 0, 
         values_fn = length) %>%
   adorn_totals("col")

-output

# x AA AB AC BB BA CC CD Total
# A  2  1  1  0  0  0  0     4
# B  0  0  0  1  1  0  0     2
# C  0  0  0  0  0  3  1     4

Or using base R with xtabs and addmargins

addmargins(xtabs(z ~ x + y, transform(df, z = 1)), 2)
#   y
#x   AA AB AC BA BB CC CD Sum
#  A  2  1  1  0  0  0  0   4
#  B  0  0  0  1  1  0  0   2
#  C  0  0  0  0  0  3  1   4

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

sum gives you 1 combined value and that value is repeated for all other rows.

sum(df3$AA,df3$AB,df3$AC,df3$BA,df3$BB,df3$CC,df3$CD)
#[1] 10

You need rowSums to get sum of each row separately.

df3$total <- rowSums(df3[-1])

Here is a simplified tidyverse approach starting from df :

library(dplyr)
library(tidyr)

df %>%
  count(x, y, name = 'num') %>%
  pivot_wider(names_from = y, values_from = num, values_fill = 0) %>%
  mutate(total = rowSums(select(., AA:CD)))

#  x        AA    AB    AC    BA    BB    CC    CD total
#  <fct> <int> <int> <int> <int> <int> <int> <int> <dbl>
#1 A         2     1     1     0     0     0     0     4
#2 B         0     0     0     1     1     0     0     2
#3 C         0     0     0     0     0     3     1     4

Upvotes: 1

Related Questions