PythonNoob
PythonNoob

Reputation: 45

Make a new column with values according to other columns - in R

Taking this dummy data for example

structure(list(Metastasis_Brain = c("1", "1", "0", "1", "0", 
"0"), Metastasis_Liver = c("0", "0", "1", "1", "1", "0"), Metastasis_Bone = c("1", 
"1", "0", "1", "1", "0")), class = "data.frame", row.names = c("Patient_1", 
"Patient_2", "Patient_3", "Patient_4", "Patient_5", "Patient_6"
))

Example of what I'm searching for: If there is 1 in columns Metastasis_Brain and Metastasis_Liver, the new column will contain "Brain, Liver".

If all three tissues are 1, then that row in the new column will contain "Brain, Liver, Bone". If all are 0, then it doesn't matter, NA would be fine.

Upvotes: 2

Views: 52

Answers (2)

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

base

df <- data.frame(
  stringsAsFactors = FALSE,
  row.names = c("Patient_1","Patient_2","Patient_3","Patient_4","Patient_5","Patient_6"),
  Metastasis_Brain = c("1", "1", "0", "1", "0", "0"),
  Metastasis_Liver = c("0", "0", "1", "1", "1", "0"),
  Metastasis_Bone = c("1", "1", "0", "1", "1", "0"),
  res = c("Brain, Bone","Brain, Bone",
          "Liver","Brain, Liver, Bone","Liver, Bone",NA)
)

df$res <- sapply(apply(df, 1, function(x) gsub("Metastasis_", "", names(df)[x == 1])), toString)            
df
#>           Metastasis_Brain Metastasis_Liver Metastasis_Bone                res
#> Patient_1                1                0               1        Brain, Bone
#> Patient_2                1                0               1        Brain, Bone
#> Patient_3                0                1               0              Liver
#> Patient_4                1                1               1 Brain, Liver, Bone
#> Patient_5                0                1               1        Liver, Bone
#> Patient_6                0                0               0                 NA

Created on 2022-06-20 by the reprex package (v2.0.1)

Upvotes: 0

Onyambu
Onyambu

Reputation: 79218

Using tidyverse:

library(tidyverse)
df  %>%
  rownames_to_column() %>%
  left_join(pivot_longer(.,-rowname, names_prefix = '.*_') %>%
  filter(value>0) %>%
  group_by(rowname) %>%
  summarise(nm = toString(name)))

   rowname Metastasis_Brain Metastasis_Liver Metastasis_Bone                 nm
1 Patient_1                1                0               1        Brain, Bone
2 Patient_2                1                0               1        Brain, Bone
3 Patient_3                0                1               0              Liver
4 Patient_4                1                1               1 Brain, Liver, Bone
5 Patient_5                0                1               1        Liver, Bone
6 Patient_6                0                0               0               <NA>

in Base R you could do:

aggregate(ind~rn, subset(transform(stack(df), 
         ind = sub('.*_', '', ind), rn = rownames(df)), values>0), toString)

         rn                ind
1 Patient_1        Brain, Bone
2 Patient_2        Brain, Bone
3 Patient_3              Liver
4 Patient_4 Brain, Liver, Bone
5 Patient_5        Liver, Bone

Upvotes: 1

Related Questions