Imputing values which = 1, with the median of the other observations in a row. (R)

Question

I'm an inexperienced R user and I am trying to pre-process some biological data before statistical analysis for differential expression, using linear modelling.

I want to impute values which == 1, by row in a dataframe, and I want to impute the values with the median of the row.

Here is some example data:

treatment1 <- c(125302640, 857538880 ,43258573000, 1, 1, 225966496, 204262864)
treatment2 <- c(193170560, 797860990, 35646611000, 1, 221060400, 1, 1027615810)
treatment3 <- c(208872576, 914684860, 31535493100, 1, 1, 659360130, 3709508860)
count <- c(0, 0, 0, 3, 2, 1, 0)
df <- data.frame(treatment1, treatment2, treatment3, count)

I made a column in the data frame called 'count', because I only want to impute the values in the data frame where the number of 1's in the row = 1.

I first used a single row as a test:

test.row <- df[6,1:4]
test.row
treatment1 treatment2 treatment3 count
6  225966496          1  659360130     1

I figured I would write a function that operated on a single row, and then use plyr::adply with .margins = 1, to apply the function to the whole df.

This is what I came up with:

if(test.row$count == 1) {
  median(as.numeric(test.row[1:3]))
  } else {
    test.row[1:3] 
  }
# Output = 225966496, which is what I want.

But I am stuck with how to integrate it into a function. Here is my latest attempt:

impute.1 <- function(df, x){
  if(df$count == 1) {
    df[x == 1] <- median(as.numeric(df[x]))
    result <- df[x]
  } else {
    result <- df[x]
  }
  print(result)
}

impute.1(test.row, 1:3)

# Output = 
#   treatment1 treatment2 treatment3
# 6  225966496          1  659360130

# Desired Output = 
#   treatment1 treatment2 treatment3
# 6  225966496  225966496  659360130

So it was not able to recognise that this row had 1 count of 1, and therefore it should replace the 1 value with the median of the row.

Any advice or comments are greatly appreciated! Regards, Thomas.

Ronak Shah · Accepted Answer

You can use this Map approach -

cols <- 1:3

impute.1 <- function(x, count){
  if(count == 1) {
    x[x == 1] <- median(as.numeric(x))
    x
  } else x
}

df[cols] <- do.call(rbind, Map(impute.1, asplit(df[cols], 1), df$count))
df

#   treatment1  treatment2  treatment3 count
#1   125302640   193170560   208872576     0
#2   857538880   797860990   914684860     0
#3 43258573000 35646611000 31535493100     0
#4           1           1           1     3
#5           1   221060400           1     2
#6   225966496   225966496   659360130     1
#7   204262864  1027615810  3709508860     0

Imputing values which = 1, with the median of the other observations in a row. (R)

Answers (2)

Related Questions