Reputation: 21
I am relatively new to coding and need help with replacing all the 0 values within my data frame with 0.5 times their non-zero row minimums. For example, I have a data frame (df) where the row numbers represent the genes and the columns represent tissue samples.
> tissue1 <- c(492, 23, 0, 3, 28, 0, 4, 100)
> tissue2 <- c(23, 41, 32, 9, 2, 5, 9, 0)
> tissue3 <- c(56, 1023, 0, 3, 1, 88, 19, 2)
> df <- data.frame(tissue1, tissue2, tissue3)
> print (df)
For row6 (or gene6), the minimum is 5, and 0.5 of 5 is 2.5. Values within row6 will be 2.5 in tissue1, 5 in tissue2, and 88 in tissue3 instead of (0, 5, and 88, respectively). I want to do this for all the rows, and my data frame has over 13000 rows and 29 columns.
I tried referring to this for help -> Replacing 0 values with the minimum value of the row in r but it didn't really help. I kept getting warnings.
Any help is really appreciated. Thank you.
Upvotes: 2
Views: 215
Reputation: 26218
dplyr
way of doing it
library(dplyr)
df %>% mutate(across(everything(), ~ifelse(. == 0, NA, .))) %>%
rowwise() %>%
mutate(dummy = min(c_across(everything()), na.rm = T) *0.5) %>%
ungroup() %>%
mutate(across(starts_with('tissue'), ~coalesce(., dummy))) %>%
select(-dummy)
# A tibble: 8 x 3
tissue1 tissue2 tissue3
<dbl> <dbl> <dbl>
1 492 23 56
2 23 41 1023
3 16 32 16
4 3 9 3
5 28 2 1
6 2.5 5 88
7 4 9 19
8 100 1 2
Adopting @akrun's strategy of using replace
, you may save here one step
df %>%
rowwise() %>%
mutate(dummy = min(replace(c_across(everything()), c_across(everything()) == 0, NA), na.rm = T) *0.5) %>%
ungroup() %>%
mutate(across(starts_with('tissue'), ~ifelse(. == 0, dummy, .))) %>%
select(-dummy)
# A tibble: 8 x 3
tissue1 tissue2 tissue3
<dbl> <dbl> <dbl>
1 492 23 56
2 23 41 1023
3 16 32 16
4 3 9 3
5 28 2 1
6 2.5 5 88
7 4 9 19
8 100 1 2
Upvotes: 1
Reputation: 21908
You can also use the following solution:
library(dplyr)
library(purrr)
df %>%
mutate(pmap_dfr(df, ~ ifelse(c(...) == 0, 0.5 * min(c(...)[c(...) != 0]), c(...))))
tissue1 tissue2 tissue3
1 492.0 23 56
2 23.0 41 1023
3 16.0 32 16
4 3.0 9 3
5 28.0 2 1
6 2.5 5 88
7 4.0 9 19
8 100.0 1 2
Upvotes: 3
Reputation: 887098
In base R
we can use pmin
to get the rowwise min
after replace
ing the dataset 0 values to NA, and make use of na.rm = TRUE
in pmin
. Then we replicate those min value per row (v1
) with row
, create a logical matrix (df==0
) to assign those 0 elements to that corresponding row minimum
v1 <- 0.5 * do.call(pmin, c(replace(df, df == 0, NA), na.rm = TRUE))
df[df == 0] <- v1[row(df)[df == 0]]
-output
df
# tissue1 tissue2 tissue3
#1 492.0 23 56
#2 23.0 41 1023
#3 16.0 32 16
#4 3.0 9 3
#5 28.0 2 1
#6 2.5 5 88
#7 4.0 9 19
#8 100.0 1 2
Upvotes: 2
Reputation: 11584
Does this work:
library(dplyr)
library(tidyr)
df %>% mutate(across(everything(), ~ na_if(., 0))) %>% mutate(id = row_number()) %>%
pivot_longer(cols = -id) %>% group_by(id) %>% mutate(value = replace_na(value, min(value, na.rm = TRUE))) %>%
pivot_wider(names_from = name, values_from = value) %>% ungroup() %>% select(-id)
# A tibble: 8 x 3
tissue1 tissue2 tissue3
<dbl> <dbl> <dbl>
1 492 23 56
2 23 41 1023
3 32 32 32
4 3 9 3
5 28 2 1
6 5 5 88
7 4 9 19
8 100 2 2
Data used:
df
tissue1 tissue2 tissue3
1 492 23 56
2 23 41 1023
3 0 32 0
4 3 9 3
5 28 2 1
6 0 5 88
7 4 9 19
8 100 0 2
Upvotes: 0