Reputation: 3384

R summarize character counts

    library(data.table)
DATA=data.table(STUDENT= c(1,2,3,4),
    DOG_1= c("a","e","a","c"),
    DOG_2= c("a","e","d","b"),
    DOG_3= c("a","d","b","c"),
    CAT_1= c("c","a","d","c"),
    CAT_2= c("c","d","a","b"),
    MOUSE_1= c("d","b","e","b"),
    MOUSE_2= c("c","a","b","e"),
    MOUSE_3= c("a","b","b","e"),
    MOUSE_4= c("b","c","a","d"))

This is how my data looks like above. I wish to end up with a new data that looks like this:

where 'a' equals to 1; 'b' equals to 2; 'c' equals to 3; 'd' equals to 4; 'e' equals to 5 and to get the value for example STUDENT 1 DOG equals to 3 is gotten by converting the letters to the values and summing up.

Upvotes: 3

Answers (3)

Uwe

Reputation: 42544

For the sake of completeness, here are two data.table approaches which use the new measure() function (available with data.table version 1.14.1) in the call to melt()

1. Melting, joining with a lookup table on-the-fly, casting

melt(DATA, measure.vars = measure(animal, rn, pattern = "(\\w+)_(\\d)"), value.name = "code")[
  .(code = letters[1:5], value = 1:5), on = "code", value := i.value][
    , dcast(.SD, STUDENT ~ animal, sum, value.var = "value")]

   STUDENT CAT DOG MOUSE
1:       1   6   3    10
2:       2   5  14     8
3:       3   5   7    10
4:       4   5   8    16

2. Melting and summing factor levels

When the lettersa to e are turned into factors, the corresponding factor levels get the numeric values 1 to 5.

library(magrittr) # piping used to improve readability
melt(DATA, measure.vars = measure(value.name, rn, pattern = "(\\w+)_(\\d)"))[, rn := NULL][
  , lapply(.SD, \(x) factor(x, levels = letters[1:5]) %>% as.integer() %>% sum(na.rm = TRUE)), 
  by = STUDENT]

   STUDENT DOG CAT MOUSE
1:       1   3   6    10
2:       2  14   5     8
3:       3   7   5    10
4:       4   8   5    16

Upvotes: 2

ThomasIsCoding

Reputation: 101343

Another data.table option using melt + dcast

dcast(
  melt(DATA, id.var = "STUDENT")[
    ,
    c("variable", "value") := .(gsub("_.*", "", variable),
      value = setNames(1:5, c("a", "b", "c", "d", "e"))[value]
    )
  ], STUDENT ~ variable, sum
)

gives

   STUDENT CAT DOG MOUSE
1:       1   6   3    10
2:       2   5  14     8
3:       3   5   7    10
4:       4   5   8    16

Upvotes: 1

akrun

Reputation: 887118

If we want to use data.table solution, melt the 'DATA', by specifying the patterns from the column names into 'long' format, then using a named vector ('keyval'), grouped by 'STUDENT, loop over the columns specified in .SDcols, match and replace the values with the integer values and sum

library(data.table)
nm1 <- unique(sub("_\\d+$", "", names(DATA)[-1]))      
dt1 <- melt(DATA, id.var = 'STUDENT', 
   measure = patterns(nm1), value.name = nm1)
keyval <- setNames(1:5, letters[1:5])

dt1[, lapply(.SD, function(x) sum(keyval[x], 
           na.rm = TRUE)), by = STUDENT, .SDcols = nm1]

-output

#   STUDENT DOG CAT MOUSE
#1:       1   3   6    10
#2:       2  14   5     8
#3:       3   7   5    10
#4:       4   8   5    16

A similar option in tidyverse would be

library(dplyr)
library(tidyr)
DATA %>% 
    pivot_longer(cols = -STUDENT, names_to = c('.value', 'grp'), 
          names_sep='_') %>%
    group_by(STUDENT) %>% 
    summarise(across(all_of(nm1), ~ sum(keyval[.], na.rm = TRUE)))
# A tibble: 4 x 4
#  STUDENT   DOG   CAT MOUSE
#    <dbl> <int> <int> <int>
#1       1     3     6    10
#2       2    14     5     8
#3       3     7     5    10
#4       4     8     5    16

Upvotes: 2

R summarize character counts

Answers (3)

1. Melting, joining with a lookup table on-the-fly, casting

2. Melting and summing factor levels

Related Questions