Reputation: 2571

Add flags to columns according to a condition

suppose you have a data.frame like this:

          FDR_1      Label_1     FDR_2      Label_2  
          0.001        NA        0.45         NA
          0.34         NA         6           NA
          0.2          NA         3           NA
          2            NA         2.5         NA
          4            NA        0.001        NA

for a total of 10.000 rows and 3000 columns and you want the following output:

       FDR_1      Label_1     FDR_2      Label_2  
          0.001        NA        0.45         NA
          0.34         NA         6           Y
          0.2          NA         3           Y
          2            Y         2.5          Y
          4            Y        0.001         NA

In other words you want to add the Y "flag" to the row where the FDR* column contains values > 2.

I tried this:

lapply(mydf, function(x) ifelse(mydf[, grepl( "FDR" , names(mydf) ) > 2, .....)

but I don't know how to go on adding the flag.

Can anyone help me please?

Thank you in advance

Upvotes: 2

Answers (4)

thothal

Reputation: 20329

A loop "free" base R variant using reshape:

df <- structure(list(FDR_1   = c(0.001, 0.34, 0.2, 2, 4), 
                     Label_1 = c(NA, NA, NA, NA, NA), 
                     FDR_2   = c(0.45, 6, 3, 2.5, 0.001), 
                     Label_2 = c(NA, NA, NA, NA, NA)), 
                class     = "data.frame", 
                row.names = c(NA, -5L))

mv <- lapply(split(names(df), 
            gsub("(.+)_\\d+", 
                 "\\1", 
                 names(df))), sort)

data_long <- reshape(df, 
                     varying   = mv, 
                     direction = "long", 
                     v.names   = names(mv))
data_long$Label[data_long$FDR >= 2] <- "Y"
reshape(data_long)
#     id FDR_1 Label_1 FDR_2 Label_2
# 1.1  1 0.001    <NA> 0.450    <NA>
# 2.1  2 0.340    <NA> 6.000       Y
# 3.1  3 0.200    <NA> 3.000       Y
# 4.1  4 2.000       Y 2.500       Y
# 5.1  5 4.000       Y 0.001    <NA>

Upvotes: 1

akrun

Reputation: 887008

We can do this in base R using

df1[!i1] <- 'Y'[(NA^(df1[i1] <= 2))]
df1
#   FDR_1 Label_1 FDR_2 Label_2
#1 0.001    <NA> 0.450    <NA>
#2 0.340    <NA> 6.000       Y
#3 0.200    <NA> 3.000       Y
#4 2.000    <NA> 2.500       Y
#5 4.000       Y 0.001    <NA>

where

i1 <-  grepl("^FDR", names(df1))

data

df1 <- structure(list(FDR_1 = c(0.001, 0.34, 0.2, 2, 4), Label_1 = c(NA, 
 NA, NA, NA, NA), FDR_2 = c(0.45, 6, 3, 2.5, 0.001), Label_2 = c(NA, 
 NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -5L
 ))

Upvotes: 4

Roman

Reputation: 17648

You can also try a tidyverse

library(tidyverse)
read.table(text="  FDR_1      Label_1     FDR_2      Label_2  
          0.001        NA        0.45         NA
          0.34         NA         6           NA
          0.2          NA         3           NA
          2            NA         2.5         NA
          4            NA        0.001        NA    ", header=T) %>% 
  rownames_to_column() %>% 
  gather(k, v, -rowname) %>% 
  separate(k, into = c("k1", "k2")) %>% 
  spread(k1, v) %>% 
  mutate(Label = ifelse(FDR >= 2, "Y", Label)) %>% 
  gather(k, v, -rowname, -k2) %>% 
  unite(k, k2, k) %>% # changing the colnames a little bit
  spread(k, v) %>% 
  select(-1)    
  1_FDR 1_Label 2_FDR 2_Label
1 0.001    <NA>  0.45    <NA>
2  0.34    <NA>     6       Y
3   0.2    <NA>     3       Y
4     2       Y   2.5       Y
5     4       Y 0.001    <NA>

Upvotes: 0

Sotos

Reputation: 51582

We can use split.default from base R, to split the columns on the number, i.e.

do.call(cbind, 
   lapply(split.default(df, gsub('\\D+', '',names(df))), function(i){
                                           i[2] <- replace(i[2], i[1] >= 2, 'Y'); i}))

#  1.FDR_1 1.Label_1 2.FDR_2 2.Label_2
#1   0.001      <NA>   0.450      <NA>
#2   0.340      <NA>   6.000         Y
#3   0.200      <NA>   3.000         Y
#4   2.000         Y   2.500         Y
#5   4.000         Y   0.001      <NA>

Upvotes: 2

Add flags to columns according to a condition

Answers (4)

data

Related Questions