Reputation: 2583
suppose to have the following situation:
Statistic1 Condition1 Statistic2 Condition2 0.00001 Y 0.02 NA 0.03 Y 0.0001 NA 0.01 NA 0.001 Y ..............
For a total of 20.000 rows and 60 columns. Suppose you want to replace in the column "Condition*" the NA/Y with 0 if the value in the relative Statistic* column is <0.05. The check will involve the paired columns Statistic*-Condition*. How is it possible to do this over a large number of columns and rows?
Thank you in advance
B
Upvotes: 1
Views: 43
Reputation: 39858
One tidyverse
possibility could be:
df %>%
mutate_at(vars(matches("Condition")), list(~ (. = 1))) %>%
rowid_to_column() %>%
gather(var, val, -rowid) %>%
arrange(rowid) %>%
group_by(rowid, pair = parse_number(var)) %>%
mutate(val = (lag(val, default = 0) < 0.05) * val) %>%
ungroup() %>%
select(-pair) %>%
spread(var, val) %>%
select(-rowid)
Condition1 Condition2 Statistic1 Statistic2
<dbl> <dbl> <dbl> <dbl>
1 1 0 0.00001 1
2 1 1 0.03 0.0001
3 1 1 0.01 0.001
Here it, first, assigns 1 to all "Condition" columns and creates a row ID. Second, it performs a wide-to-long data transformation, excluding the row ID. Third, it arranges the data according row ID and groups by row ID and pair which is made of the number in the columns. Forth, it checks whether the statistic is less than 0.05. Finally, it returns the data to its original format and removes the redundant variables.
I used this sample data where I added one case case where the statistic is equal to 1:
df <- read.table(text = "Statistic1 Condition1 Statistic2 Condition2
0.00001 Y 1 NA
0.03 Y 0.0001 NA
0.01 NA 0.001 Y",
header = TRUE,
stringsAsFactors = FALSE)
Upvotes: 1
Reputation: 3852
You make a boolen for each column and then write and (&) between them. Here a simple example where I check if two columns live up to the condition that the numbers in both columns have to be above three.
# Creating data
df <- data.frame(a = c(1,2,3,4), b = c(2,2,3,2))
# Running conditions on both columns and storing results in a new column
df$c <- df$a>2 & df$b>2
If you want to make replacements in one column based on another column, you can do the following.
# Creating data
df <- data.frame(a = c(1,2,3,4), b = c(2,2,3,2))
# If column a is above 2 column b is set to zero
df$b[df$a>2] <- 0
In the future please supply example data and output such that we can help.
Upvotes: 0