Rafael
Rafael

Reputation: 3196

calculate new column from mapped values

I have a data.frame df

df = data.frame(v = c('E', 'B', 'EB', 'RM'))
df$n= 100 / apply(df, 1, nchar)

Where v represents values E = 4, B = 3, R = 2, and M = 1

I want to calculate a column like so:

   v   n idx
1  E 100 400
2  B 100 300
3 EB  50 350
4 RM  50 150

Where idx is n (v). For example for the first row 4 * 100 = 400 and for the last row (2 + 1) * 50 = 150

I have something like this:

df$e = ifelse(grepl('E', df$v), 4, 0)
df$b = ifelse(grepl('B', df$v), 3, 0)
df$r = ifelse(grepl('R', df$v), 2, 0)
df$m = ifelse(grepl('M', df$v), 1, 0)

df$idx = df$n * (df$e + df$b + df$r + df$m)

But it becomes unfeasible as the number of columns grows.

Upvotes: 1

Views: 60

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269461

1) Define a lookup table, lookup, and a function Sum that takes a vector of single letters, looks up each and sums their lookup number.

split v into a list of vectors of single letters and sapply over that list using Sum mulitplying the result by n.

lookup <- c(E = 4, B = 3, R = 2, M = 1)
Sum <- function(x) sum(lookup[x])
transform(df, idx = n * sapply(strsplit(as.character(v), ""), Sum))

giving:

   v   n idx
1  E 100 400
2  B 100 300
3 EB  50 350
4 RM  50 150

2) An alternative using lookup from above is the following which for each character in v applies lookup using the anonymous function expressed in formula notation creating a list over which we sapply the sum and finally multiply by n.

library(gsubfn)
transform(df, idx = n * sapply(strapply(as.character(v), ".", x ~ lookup[x]), sum))

3) A dplyr/tidyr solution using lookup from above is the following. We insert an id to uniquely identify each row and the use separate_rows to place each letter of v in a separate row. We then summarize all rows with the same id by looking up each letter and summing. Finally we remove id.

library(dplyr)
library(tidyr)

df %>% 
   mutate(id = 1:n()) %>% 
   separate_rows(v, sep = "(?<=.)(?=.)") %>%
   group_by(id, n) %>%
   summarize(idx = sum(n * lookup[v])) %>%
   ungroup %>%
   select(-id)

giving:

# A tibble: 4 x 3
     id     n   idx
  <int> <dbl> <dbl>
1     1  100.  400.
2     2  100.  300.
3     3   50.  350.
4     4   50.  150.

One could avoid the complex regular expression by replacing the separate_rows statement with these two statements:

mutate(v = strsplit(as.character(v), "")) %>%
unnest %>%

Upvotes: 3

thelatemail
thelatemail

Reputation: 93813

Make a look-up table with your values. Then match between a split version (via strsplit) of your df$v column, sum the corresponding values and do your multiplication calculation:

lkup <- data.frame(id=c("E","B","R","M"),value=c(4,3,2,1))
sapply(
  strsplit(as.character(df$v),""),
  function(x) sum(lkup$value[match(x,lkup$id)])
) * df$n
#[1] 400 300 350 150

Upvotes: 1

Related Questions