Reputation: 146
I'm looking for an algorithm to map the values in each column based on a chosen dictionary.
Here's an example:
Artificial data:
df = data.frame(
sex = c(0,0,0,1,0,1,0,1,1,1,0),
icu = c(1,1,0,1,0,1,0,1,1,1,1),
niv = c(0,1,0,1,0,1,0,0,0,1,0),
mv = c(1,0,0,1,1,1,0,0,0,1,0),
o2 = c(1,0,0,1,0,1,0,0,0,1,0)
)
I have two dictionaries. The goal is to create a new_column based on the values of each dictionary. In the end, sum row-wise the values of this set of columns and save them in a new column.
Dictionaries
dict1 <- list(
sex = 0,
icu = 2,
niv = 1,
mv = 3,
o2 = 2
)
dict2 <- list(
sex = 3,
icu = 4,
niv = 2,
mv = 6,
o2 = 1
)
I managed to operationalize this using the following algorithm. But it is not scalable to N variables.
Current solution:
set_index <- function(dataset, dict){
temp = dataset
temp$score = rep(0, times=nrow(temp))
for(row in seq_len(nrow(temp))){
for(c in names(dict2)){
if(temp[row, c] == 1){
temp[row, "score"] = temp[row, "score"] + dict2[[c]]
}
}
}
return(temp)
}
dataset <- set_index(dataset, dict2)
dataset$score <- tidyr::replace_na(data = dataset$score, 0)
I have a solution in Python, but I couldn't transport it to R.
import numpy as np
def ReplaceAndSumValues(dataset, dict):
out = df.transform(lambda x, dct: np.where(x, dct[x.name][1], dct[x.name][0]), dct=d)
return out.assign(sum=out.sum(axis=1))
Upvotes: 1
Views: 708
Reputation: 145805
This is just matrix multiplication:
foo = function(df, dict) {
df = df[names(dict)]
as.matrix(df) %*% unlist(dict)
}
df$result1 = foo(df, dict1)
df$result2 = foo(df, dict2)
df
# sex icu niv mv o2 result1 result2
# 1 0 1 0 1 1 7 11
# 2 0 1 1 0 0 3 6
# 3 0 0 0 0 0 0 0
# 4 1 1 1 1 1 8 16
# 5 0 0 0 1 0 3 6
# 6 1 1 1 1 1 8 16
# 7 0 0 0 0 0 0 0
# 8 1 1 0 0 0 2 7
# 9 1 1 0 0 0 2 7
# 10 1 1 1 1 1 8 16
# 11 0 1 0 0 0 2 4
Upvotes: 1