Reputation: 3196
I have a data.frame df
df = data.frame(v = c('E', 'B', 'EB', 'RM'))
df$n= 100 / apply(df, 1, nchar)
Where v
represents values E = 4
, B = 3
, R = 2
, and M = 1
I want to calculate a column like so:
v n idx
1 E 100 400
2 B 100 300
3 EB 50 350
4 RM 50 150
Where idx is n (v)
. For example for the first row 4 * 100 = 400
and for the last row (2 + 1) * 50 = 150
I have something like this:
df$e = ifelse(grepl('E', df$v), 4, 0)
df$b = ifelse(grepl('B', df$v), 3, 0)
df$r = ifelse(grepl('R', df$v), 2, 0)
df$m = ifelse(grepl('M', df$v), 1, 0)
df$idx = df$n * (df$e + df$b + df$r + df$m)
But it becomes unfeasible as the number of columns grows.
Upvotes: 1
Views: 60
Reputation: 269461
1) Define a lookup table, lookup
, and a function Sum
that takes a vector of single letters, looks up each and sums their lookup number.
split v
into a list of vectors of single letters and sapply
over that list using Sum
mulitplying the result by n
.
lookup <- c(E = 4, B = 3, R = 2, M = 1)
Sum <- function(x) sum(lookup[x])
transform(df, idx = n * sapply(strsplit(as.character(v), ""), Sum))
giving:
v n idx
1 E 100 400
2 B 100 300
3 EB 50 350
4 RM 50 150
2) An alternative using lookup
from above is the following which for each character in v
applies lookup
using the anonymous function expressed in formula notation creating a list over which we sapply
the sum
and finally multiply by n
.
library(gsubfn)
transform(df, idx = n * sapply(strapply(as.character(v), ".", x ~ lookup[x]), sum))
3) A dplyr/tidyr solution using lookup
from above is the following. We insert an id
to uniquely identify each row and the use separate_rows
to place each letter of v
in a separate row. We then summarize all rows with the same id by looking up each letter and summing. Finally we remove id
.
library(dplyr)
library(tidyr)
df %>%
mutate(id = 1:n()) %>%
separate_rows(v, sep = "(?<=.)(?=.)") %>%
group_by(id, n) %>%
summarize(idx = sum(n * lookup[v])) %>%
ungroup %>%
select(-id)
giving:
# A tibble: 4 x 3
id n idx
<int> <dbl> <dbl>
1 1 100. 400.
2 2 100. 300.
3 3 50. 350.
4 4 50. 150.
One could avoid the complex regular expression by replacing the separate_rows
statement with these two statements:
mutate(v = strsplit(as.character(v), "")) %>%
unnest %>%
Upvotes: 3
Reputation: 93813
Make a look-up table with your values. Then match
between a split version (via strsplit
) of your df$v
column, sum
the corresponding values and do your multiplication calculation:
lkup <- data.frame(id=c("E","B","R","M"),value=c(4,3,2,1))
sapply(
strsplit(as.character(df$v),""),
function(x) sum(lkup$value[match(x,lkup$id)])
) * df$n
#[1] 400 300 350 150
Upvotes: 1