Tom
Tom

Reputation: 21

How to replace 0 values in data frame with 0.5 times their non-zero row minimums in R (0.5*min)

I am relatively new to coding and need help with replacing all the 0 values within my data frame with 0.5 times their non-zero row minimums. For example, I have a data frame (df) where the row numbers represent the genes and the columns represent tissue samples.

> tissue1 <- c(492, 23, 0, 3, 28, 0, 4, 100)
> tissue2 <- c(23, 41, 32, 9, 2, 5, 9, 0)
> tissue3 <- c(56, 1023, 0, 3, 1, 88, 19, 2)
> df <- data.frame(tissue1, tissue2, tissue3)
> print (df)

For row6 (or gene6), the minimum is 5, and 0.5 of 5 is 2.5. Values within row6 will be 2.5 in tissue1, 5 in tissue2, and 88 in tissue3 instead of (0, 5, and 88, respectively). I want to do this for all the rows, and my data frame has over 13000 rows and 29 columns.

I tried referring to this for help -> Replacing 0 values with the minimum value of the row in r but it didn't really help. I kept getting warnings.

Any help is really appreciated. Thank you.

Upvotes: 2

Views: 215

Answers (4)

AnilGoyal
AnilGoyal

Reputation: 26218

dplyr way of doing it

library(dplyr)

df %>% mutate(across(everything(), ~ifelse(. == 0, NA, .))) %>%
  rowwise() %>%
  mutate(dummy = min(c_across(everything()), na.rm = T) *0.5) %>%
  ungroup() %>%
  mutate(across(starts_with('tissue'), ~coalesce(., dummy))) %>%
  select(-dummy)

# A tibble: 8 x 3
  tissue1 tissue2 tissue3
    <dbl>   <dbl>   <dbl>
1   492        23      56
2    23        41    1023
3    16        32      16
4     3         9       3
5    28         2       1
6     2.5       5      88
7     4         9      19
8   100         1       2

Adopting @akrun's strategy of using replace, you may save here one step

df %>% 
  rowwise() %>%
  mutate(dummy = min(replace(c_across(everything()), c_across(everything()) == 0, NA), na.rm = T) *0.5) %>%
  ungroup() %>%
  mutate(across(starts_with('tissue'), ~ifelse(. == 0, dummy, .))) %>%
  select(-dummy)

# A tibble: 8 x 3
  tissue1 tissue2 tissue3
    <dbl>   <dbl>   <dbl>
1   492        23      56
2    23        41    1023
3    16        32      16
4     3         9       3
5    28         2       1
6     2.5       5      88
7     4         9      19
8   100         1       2

Upvotes: 1

Anoushiravan R
Anoushiravan R

Reputation: 21908

You can also use the following solution:

library(dplyr)
library(purrr)

df %>%
  mutate(pmap_dfr(df, ~ ifelse(c(...) == 0, 0.5 * min(c(...)[c(...) != 0]), c(...))))


  tissue1 tissue2 tissue3
1   492.0      23      56
2    23.0      41    1023
3    16.0      32      16
4     3.0       9       3
5    28.0       2       1
6     2.5       5      88
7     4.0       9      19
8   100.0       1       2

Upvotes: 3

akrun
akrun

Reputation: 887098

In base R we can use pmin to get the rowwise min after replaceing the dataset 0 values to NA, and make use of na.rm = TRUE in pmin. Then we replicate those min value per row (v1) with row, create a logical matrix (df==0) to assign those 0 elements to that corresponding row minimum

v1 <-  0.5 * do.call(pmin, c(replace(df, df == 0, NA), na.rm = TRUE))
df[df == 0] <- v1[row(df)[df == 0]]

-output

df
#   tissue1 tissue2 tissue3
#1   492.0      23      56
#2    23.0      41    1023
#3    16.0      32      16
#4     3.0       9       3
#5    28.0       2       1
#6     2.5       5      88
#7     4.0       9      19
#8   100.0       1       2

Upvotes: 2

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
library(tidyr)
df %>% mutate(across(everything(), ~ na_if(., 0))) %>% mutate(id = row_number()) %>% 
   pivot_longer(cols = -id) %>% group_by(id) %>% mutate(value = replace_na(value, min(value, na.rm = TRUE))) %>% 
     pivot_wider(names_from = name, values_from = value) %>% ungroup() %>% select(-id)
# A tibble: 8 x 3
  tissue1 tissue2 tissue3
    <dbl>   <dbl>   <dbl>
1     492      23      56
2      23      41    1023
3      32      32      32
4       3       9       3
5      28       2       1
6       5       5      88
7       4       9      19
8     100       2       2

Data used:

df
  tissue1 tissue2 tissue3
1     492      23      56
2      23      41    1023
3       0      32       0
4       3       9       3
5      28       2       1
6       0       5      88
7       4       9      19
8     100       0       2

Upvotes: 0

Related Questions