Sejin
Sejin

Reputation: 85

How to change values from character to number using if and for statement?

I am handling a microarray data.

I have two tables, one is pathway and gene set table (I will call it as A table) and the other is microarray table (Lets say it B)

I need to change gene symbols(characters) to expression value(numbers) in A table according to each expression value of gene symbols in B

Tables look like followings

A table                                            B table
Pathway   v1    v2   ...v249 v250                 Gene      Value         
   1       A    E        NA   NA                   E        1000
   2       B    A        Z    I                    A         500
   3       C    G        X    NA                   G         200
   4       D    K        P    NA                   B         300
                                                   P          10
                                                   Z          20

I want to change A table like following way

   A table                            
Pathway   v1       v2   ...    v249 v250      
   1       500    1000         NA    NA 
   2       300    500          20    NA
   3       NA     200          NA    NA
   4       NA     NA           10    NA 

If there are no matched gene symbols, they should be replaced with 'NA'

Upvotes: 4

Views: 99

Answers (3)

akrun
akrun

Reputation: 887038

We can also do this using base R. We convert the subset of 'A' (i.e. except the 'Pathway' column) to matrix, match with 'Gene' from 'B', the numeric index obtained can be used to populate the corresponding 'Value' column, and assign the output back.

A1 <- A
A1[-1] <- B$Value[match(as.matrix(A[-1]), B$Gene)]
A1
#  Pathway  v1   v2
#1       1 500 1000
#2       2 300  500
#3       3  NA  200
#4       4  NA   NA

NOTE: Datasets from @DavidArenburg's post.

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92282

I would suggest, first melting, then merging, the dcasting back. This will work for any number of columns in the A data set. I will be using the latest data.table version on CRAN for this (v 1.9.6+)

library(data.table) # V 1.9.6+
res <- melt(setDT(A), id = "Pathway")[setDT(B), Value := i.Value, on = c(value = "Gene")]
dcast(res, Pathway ~ variable, value.var = "Value")
#    Pathway  v1   v2
# 1:       1 500 1000
# 2:       2 300  500
# 3:       3  NA  200
# 4:       4  NA   NA

Or similarly using Hadleyverse

library(dplyr)
library(tidyr)
A %>%
  gather(res, Gene, -Pathway) %>%
  left_join(., B, by = "Gene") %>%
  select(-Gene) %>%
  spread(res, Value)
#   Pathway  v1   v2
# 1       1 500 1000
# 2       2 300  500
# 3       3  NA  200
# 4       4  NA   NA  

Data

A <- structure(list(Pathway = 1:4, v1 = structure(1:4, .Label = c("A", 
"B", "C", "D"), class = "factor"), v2 = structure(c(2L, 1L, 3L, 
4L), .Label = c("A", "E", "G", "K"), class = "factor")), .Names = c("Pathway", 
"v1", "v2"), class = "data.frame", row.names = c(NA, -4L))

B <- structure(list(Gene = structure(c(3L, 1L, 4L, 2L), .Label = c("A", 
"B", "E", "G"), class = "factor"), Value = c(1000L, 500L, 200L, 
300L)), .Names = c("Gene", "Value"), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 3

Paul Hiemstra
Paul Hiemstra

Reputation: 60924

This can be done most easily using a lookup table, which is in essence a vector with associated names in R:

library(dplyr)
df = data.frame(v1 = sample(LETTERS[1:8], 100, replace = TRUE),
                v2 = sample(LETTERS[1:8], 100, replace = TRUE),
                v3 = sample(LETTERS[1:8], 100, replace = TRUE),
                v4 = sample(LETTERS[1:8], 100, replace = TRUE))
lut = runif(6)
names(lut) = LETTERS[1:6]

replace_fun = function(vec) lut[vec]    
df %>% mutate_each(funs(replace_fun), v1:v4)
           a         b
1 0.97821935 0.8584000
2         NA        NA
3 0.56299342 0.9782194
4 0.85840001 0.8584000
5 0.97821935 0.8584000
6 0.06881867 0.9782194

Essentially, the name of each element is the letter in df and lut[letter] looks up which value belongs to that letter. By using lut[vec], we put the entire vector containing letters into the lookup table, which translates the entire vector to the corresponding number.

The %>% and mutate_each are functions from dplyr, which I use to practically perform the replacement on the example data.

Upvotes: 2

Related Questions