bvowe
bvowe

Reputation: 3384

R summarize character counts

    library(data.table)
DATA=data.table(STUDENT= c(1,2,3,4),
    DOG_1= c("a","e","a","c"),
    DOG_2= c("a","e","d","b"),
    DOG_3= c("a","d","b","c"),
    CAT_1= c("c","a","d","c"),
    CAT_2= c("c","d","a","b"),
    MOUSE_1= c("d","b","e","b"),
    MOUSE_2= c("c","a","b","e"),
    MOUSE_3= c("a","b","b","e"),
    MOUSE_4= c("b","c","a","d"))

This is how my data looks like above. I wish to end up with a new data that looks like this:

enter image description here

where 'a' equals to 1; 'b' equals to 2; 'c' equals to 3; 'd' equals to 4; 'e' equals to 5 and to get the value for example STUDENT 1 DOG equals to 3 is gotten by converting the letters to the values and summing up.

Upvotes: 3

Views: 481

Answers (3)

Uwe
Uwe

Reputation: 42544

For the sake of completeness, here are two approaches which use the new measure() function (available with data.table version 1.14.1) in the call to melt()

1. Melting, joining with a lookup table on-the-fly, casting

melt(DATA, measure.vars = measure(animal, rn, pattern = "(\\w+)_(\\d)"), value.name = "code")[
  .(code = letters[1:5], value = 1:5), on = "code", value := i.value][
    , dcast(.SD, STUDENT ~ animal, sum, value.var = "value")]
   STUDENT CAT DOG MOUSE
1:       1   6   3    10
2:       2   5  14     8
3:       3   5   7    10
4:       4   5   8    16

2. Melting and summing factor levels

When the lettersa to e are turned into factors, the corresponding factor levels get the numeric values 1 to 5.

library(magrittr) # piping used to improve readability
melt(DATA, measure.vars = measure(value.name, rn, pattern = "(\\w+)_(\\d)"))[, rn := NULL][
  , lapply(.SD, \(x) factor(x, levels = letters[1:5]) %>% as.integer() %>% sum(na.rm = TRUE)), 
  by = STUDENT]
   STUDENT DOG CAT MOUSE
1:       1   3   6    10
2:       2  14   5     8
3:       3   7   5    10
4:       4   8   5    16

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101343

Another data.table option using melt + dcast

dcast(
  melt(DATA, id.var = "STUDENT")[
    ,
    c("variable", "value") := .(gsub("_.*", "", variable),
      value = setNames(1:5, c("a", "b", "c", "d", "e"))[value]
    )
  ], STUDENT ~ variable, sum
)

gives

   STUDENT CAT DOG MOUSE
1:       1   6   3    10
2:       2   5  14     8
3:       3   5   7    10
4:       4   5   8    16

Upvotes: 1

akrun
akrun

Reputation: 887118

If we want to use data.table solution, melt the 'DATA', by specifying the patterns from the column names into 'long' format, then using a named vector ('keyval'), grouped by 'STUDENT, loop over the columns specified in .SDcols, match and replace the values with the integer values and sum

library(data.table)
nm1 <- unique(sub("_\\d+$", "", names(DATA)[-1]))      
dt1 <- melt(DATA, id.var = 'STUDENT', 
   measure = patterns(nm1), value.name = nm1)
keyval <- setNames(1:5, letters[1:5])

dt1[, lapply(.SD, function(x) sum(keyval[x], 
           na.rm = TRUE)), by = STUDENT, .SDcols = nm1]

-output

#   STUDENT DOG CAT MOUSE
#1:       1   3   6    10
#2:       2  14   5     8
#3:       3   7   5    10
#4:       4   8   5    16

A similar option in tidyverse would be

library(dplyr)
library(tidyr)
DATA %>% 
    pivot_longer(cols = -STUDENT, names_to = c('.value', 'grp'), 
          names_sep='_') %>%
    group_by(STUDENT) %>% 
    summarise(across(all_of(nm1), ~ sum(keyval[.], na.rm = TRUE)))
# A tibble: 4 x 4
#  STUDENT   DOG   CAT MOUSE
#    <dbl> <int> <int> <int>
#1       1     3     6    10
#2       2    14     5     8
#3       3     7     5    10
#4       4     8     5    16

Upvotes: 2

Related Questions