Reputation: 2571
suppose you have a data.frame like this:
FDR_1 Label_1 FDR_2 Label_2
0.001 NA 0.45 NA
0.34 NA 6 NA
0.2 NA 3 NA
2 NA 2.5 NA
4 NA 0.001 NA
for a total of 10.000 rows and 3000 columns and you want the following output:
FDR_1 Label_1 FDR_2 Label_2
0.001 NA 0.45 NA
0.34 NA 6 Y
0.2 NA 3 Y
2 Y 2.5 Y
4 Y 0.001 NA
In other words you want to add the Y "flag" to the row where the FDR* column contains values > 2.
I tried this:
lapply(mydf, function(x) ifelse(mydf[, grepl( "FDR" , names(mydf) ) > 2, .....)
but I don't know how to go on adding the flag.
Can anyone help me please?
Thank you in advance
Upvotes: 2
Views: 66
Reputation: 20329
A loop "free" base R
variant using reshape
:
df <- structure(list(FDR_1 = c(0.001, 0.34, 0.2, 2, 4),
Label_1 = c(NA, NA, NA, NA, NA),
FDR_2 = c(0.45, 6, 3, 2.5, 0.001),
Label_2 = c(NA, NA, NA, NA, NA)),
class = "data.frame",
row.names = c(NA, -5L))
mv <- lapply(split(names(df),
gsub("(.+)_\\d+",
"\\1",
names(df))), sort)
data_long <- reshape(df,
varying = mv,
direction = "long",
v.names = names(mv))
data_long$Label[data_long$FDR >= 2] <- "Y"
reshape(data_long)
# id FDR_1 Label_1 FDR_2 Label_2
# 1.1 1 0.001 <NA> 0.450 <NA>
# 2.1 2 0.340 <NA> 6.000 Y
# 3.1 3 0.200 <NA> 3.000 Y
# 4.1 4 2.000 Y 2.500 Y
# 5.1 5 4.000 Y 0.001 <NA>
Upvotes: 1
Reputation: 887008
We can do this in base R
using
df1[!i1] <- 'Y'[(NA^(df1[i1] <= 2))]
df1
# FDR_1 Label_1 FDR_2 Label_2
#1 0.001 <NA> 0.450 <NA>
#2 0.340 <NA> 6.000 Y
#3 0.200 <NA> 3.000 Y
#4 2.000 <NA> 2.500 Y
#5 4.000 Y 0.001 <NA>
where
i1 <- grepl("^FDR", names(df1))
df1 <- structure(list(FDR_1 = c(0.001, 0.34, 0.2, 2, 4), Label_1 = c(NA,
NA, NA, NA, NA), FDR_2 = c(0.45, 6, 3, 2.5, 0.001), Label_2 = c(NA,
NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -5L
))
Upvotes: 4
Reputation: 17648
You can also try a tidyverse
library(tidyverse)
read.table(text=" FDR_1 Label_1 FDR_2 Label_2
0.001 NA 0.45 NA
0.34 NA 6 NA
0.2 NA 3 NA
2 NA 2.5 NA
4 NA 0.001 NA ", header=T) %>%
rownames_to_column() %>%
gather(k, v, -rowname) %>%
separate(k, into = c("k1", "k2")) %>%
spread(k1, v) %>%
mutate(Label = ifelse(FDR >= 2, "Y", Label)) %>%
gather(k, v, -rowname, -k2) %>%
unite(k, k2, k) %>% # changing the colnames a little bit
spread(k, v) %>%
select(-1)
1_FDR 1_Label 2_FDR 2_Label
1 0.001 <NA> 0.45 <NA>
2 0.34 <NA> 6 Y
3 0.2 <NA> 3 Y
4 2 Y 2.5 Y
5 4 Y 0.001 <NA>
Upvotes: 0
Reputation: 51582
We can use split.default
from base R, to split the columns on the number, i.e.
do.call(cbind,
lapply(split.default(df, gsub('\\D+', '',names(df))), function(i){
i[2] <- replace(i[2], i[1] >= 2, 'Y'); i}))
# 1.FDR_1 1.Label_1 2.FDR_2 2.Label_2
#1 0.001 <NA> 0.450 <NA>
#2 0.340 <NA> 6.000 Y
#3 0.200 <NA> 3.000 Y
#4 2.000 Y 2.500 Y
#5 4.000 Y 0.001 <NA>
Upvotes: 2