Reputation: 3384
library(data.table)
DATA=data.table(STUDENT= c(1,2,3,4),
DOG_1= c("a","e","a","c"),
DOG_2= c("a","e","d","b"),
DOG_3= c("a","d","b","c"),
CAT_1= c("c","a","d","c"),
CAT_2= c("c","d","a","b"),
MOUSE_1= c("d","b","e","b"),
MOUSE_2= c("c","a","b","e"),
MOUSE_3= c("a","b","b","e"),
MOUSE_4= c("b","c","a","d"))
This is how my data looks like above. I wish to end up with a new data that looks like this:
where 'a' equals to 1; 'b' equals to 2; 'c' equals to 3; 'd' equals to 4; 'e' equals to 5 and to get the value for example STUDENT 1 DOG equals to 3 is gotten by converting the letters to the values and summing up.
Upvotes: 3
Views: 481
Reputation: 42544
For the sake of completeness, here are two data.table approaches which use the new measure()
function (available with data.table
version 1.14.1) in the call to melt()
melt(DATA, measure.vars = measure(animal, rn, pattern = "(\\w+)_(\\d)"), value.name = "code")[
.(code = letters[1:5], value = 1:5), on = "code", value := i.value][
, dcast(.SD, STUDENT ~ animal, sum, value.var = "value")]
STUDENT CAT DOG MOUSE 1: 1 6 3 10 2: 2 5 14 8 3: 3 5 7 10 4: 4 5 8 16
When the lettersa
to e
are turned into factors, the corresponding factor levels get the numeric values 1
to 5
.
library(magrittr) # piping used to improve readability
melt(DATA, measure.vars = measure(value.name, rn, pattern = "(\\w+)_(\\d)"))[, rn := NULL][
, lapply(.SD, \(x) factor(x, levels = letters[1:5]) %>% as.integer() %>% sum(na.rm = TRUE)),
by = STUDENT]
STUDENT DOG CAT MOUSE 1: 1 3 6 10 2: 2 14 5 8 3: 3 7 5 10 4: 4 8 5 16
Upvotes: 2
Reputation: 101343
Another data.table
option using melt
+ dcast
dcast(
melt(DATA, id.var = "STUDENT")[
,
c("variable", "value") := .(gsub("_.*", "", variable),
value = setNames(1:5, c("a", "b", "c", "d", "e"))[value]
)
], STUDENT ~ variable, sum
)
gives
STUDENT CAT DOG MOUSE
1: 1 6 3 10
2: 2 5 14 8
3: 3 5 7 10
4: 4 5 8 16
Upvotes: 1
Reputation: 887118
If we want to use data.table
solution, melt
the 'DATA', by specifying the patterns
from the column names into 'long' format, then using a named vector
('keyval'), grouped by 'STUDENT, loop over the columns specified in .SDcols
, match and replace the values with the integer values and sum
library(data.table)
nm1 <- unique(sub("_\\d+$", "", names(DATA)[-1]))
dt1 <- melt(DATA, id.var = 'STUDENT',
measure = patterns(nm1), value.name = nm1)
keyval <- setNames(1:5, letters[1:5])
dt1[, lapply(.SD, function(x) sum(keyval[x],
na.rm = TRUE)), by = STUDENT, .SDcols = nm1]
-output
# STUDENT DOG CAT MOUSE
#1: 1 3 6 10
#2: 2 14 5 8
#3: 3 7 5 10
#4: 4 8 5 16
A similar option in tidyverse
would be
library(dplyr)
library(tidyr)
DATA %>%
pivot_longer(cols = -STUDENT, names_to = c('.value', 'grp'),
names_sep='_') %>%
group_by(STUDENT) %>%
summarise(across(all_of(nm1), ~ sum(keyval[.], na.rm = TRUE)))
# A tibble: 4 x 4
# STUDENT DOG CAT MOUSE
# <dbl> <int> <int> <int>
#1 1 3 6 10
#2 2 14 5 8
#3 3 7 5 10
#4 4 8 5 16
Upvotes: 2