NewUsr_stat
NewUsr_stat

Reputation: 2583

Apply a condition over paired columns

suppose to have the following situation:

    Statistic1       Condition1     Statistic2       Condition2         
      0.00001            Y             0.02              NA      
      0.03               Y             0.0001            NA         
      0.01               NA            0.001              Y       
     ..............

For a total of 20.000 rows and 60 columns. Suppose you want to replace in the column "Condition*" the NA/Y with 0 if the value in the relative Statistic* column is <0.05. The check will involve the paired columns Statistic*-Condition*. How is it possible to do this over a large number of columns and rows?

Thank you in advance

B

Upvotes: 1

Views: 43

Answers (2)

tmfmnk
tmfmnk

Reputation: 39858

One tidyverse possibility could be:

df %>%
 mutate_at(vars(matches("Condition")), list(~ (. = 1))) %>%
 rowid_to_column() %>%
 gather(var, val, -rowid) %>%
 arrange(rowid) %>%
 group_by(rowid, pair = parse_number(var)) %>%
 mutate(val = (lag(val, default = 0) < 0.05) * val) %>%
 ungroup() %>%
 select(-pair) %>%
 spread(var, val) %>%
 select(-rowid)

  Condition1 Condition2 Statistic1 Statistic2
       <dbl>      <dbl>      <dbl>      <dbl>
1          1          0    0.00001     1     
2          1          1    0.03        0.0001
3          1          1    0.01        0.001 

Here it, first, assigns 1 to all "Condition" columns and creates a row ID. Second, it performs a wide-to-long data transformation, excluding the row ID. Third, it arranges the data according row ID and groups by row ID and pair which is made of the number in the columns. Forth, it checks whether the statistic is less than 0.05. Finally, it returns the data to its original format and removes the redundant variables.

I used this sample data where I added one case case where the statistic is equal to 1:

df <- read.table(text = "Statistic1       Condition1     Statistic2       Condition2         
0.00001            Y             1              NA      
0.03               Y             0.0001            NA         
0.01               NA            0.001              Y", 
                 header = TRUE,
                 stringsAsFactors = FALSE)

Upvotes: 1

Esben Eickhardt
Esben Eickhardt

Reputation: 3852

You make a boolen for each column and then write and (&) between them. Here a simple example where I check if two columns live up to the condition that the numbers in both columns have to be above three.

# Creating data
df <- data.frame(a = c(1,2,3,4), b = c(2,2,3,2))

# Running conditions on both columns and storing results in a new column
df$c <- df$a>2 & df$b>2 

If you want to make replacements in one column based on another column, you can do the following.

# Creating data
df <- data.frame(a = c(1,2,3,4), b = c(2,2,3,2))

# If column a is above 2 column b is set to zero
df$b[df$a>2] <- 0

In the future please supply example data and output such that we can help.

Upvotes: 0

Related Questions