Isabela Jerônimo
Isabela Jerônimo

Reputation: 23

Adding information to a column conditionally comparing strings

I have a data frame called "grass". One of the information in this data frame is "Line" which can be: high, low, f1, f2, bl or bh.

I created a new column and want to add information to this column as the following code shows.

The problem is that I get "1" for all, not just for "high"

#add new column
grass["genome.inherited"] <- NA

#adding information to genome.inherited 
#1 for the high-tolerance parent genotype (high)
#0 for the low-tolerance parent genotype  (low)
#0.5 for the F1 and F2 hybrids (f1) (f2)
#0.25 for the backcross to the low tolerance population (bl)
#0.75 for the backcross to the high tolerance population (bh)

#how I tried to solve the problem
grass$genome.inherited <- if(grass$line == 'high'){
    1
} else if(grass$line == 'low'){
    0
} else if(grass$line == 'bl'){
    0.25
} else if(grass$line == 'bh'){
    0.75
} else {
    0.5
}

As suggested here is the output for head(grass)

line cube.root.height genome.inherited
high             4.13                1
high             5.36                1
high             4.37                1
high             5.08                1
high             4.85                1
high             5.59                1

Thank you!

Upvotes: 2

Views: 53

Answers (4)

IRTFM
IRTFM

Reputation: 263352

How about using the match function. It gives a number that indicates the position of a value in a character vector and has an "nomatch" value as well.

grass$genome.inherited <- c(1, 0, 0.25, 0.75, 0.5)[ 
                        match( grass$line, c( 'high', 'low','bl','bh'), nomatch=5) ]

Example from console with other values of line to test:

 grass <- read.table(text="line cube.root.height genome.inherited
 high             4.13                1
 high             5.36                2
 low             4.37                1
 high             5.08                1
 junk             4.85                1
 high             5.59                1
 ", head=T)

 grass$genome.inherited <- c(1, 0, 0.25, 0.75, 0.5)[ 
                       match( grass$line, c( 'high', 'low','bl','bh'), nomatch=5) ]
 grass
#----
  line cube.root.height genome.inherited
1 high             4.13              1.0
2 high             5.36              1.0
3  low             4.37              0.0
4 high             5.08              1.0
5 junk             4.85              0.5
6 high             5.59              1.0

Upvotes: 1

Shree
Shree

Reputation: 11140

Your if conditions have length > 1. When the condition has length > 1 only the first element will be used and that's why you are getting all 1s.

Here's a different (simpler than nested ifelse) approach for the same thing -

vals <- c(high = 1, low = 0, f1 = 0.5, f2 = 0.5, bl = 0.25, bh = 0.75)

grass$genome.inherited <- vals[as.character(grass$line)]

Upvotes: 0

r2evans
r2evans

Reputation: 160447

I agree (with 42-) that nested ifelse statements is not preferred. @42-'s solution of match is (imo) far better than ifelses.

An alternative is to merge them.

Data:

grass <- read.table(text="line cube.root.height
 high             4.13
 high             5.36
 low             4.37 
 high             5.08
 junk             4.85
 high             5.59
 ", head=TRUE, stringsAsFactors=FALSE)

The table of values to merge in:

genome <- data.frame(
  line=c("high","low","bl","bh"),
  genome.inherited=c(1, 0, 0.25, 0.75),
  stringsAsFactors=FALSE)

The merge:

grass2 <- merge(grass, genome, by="line", all.x=TRUE)

If you look at the data, you'll see an NA, because "junk" (an unknown value) is not present in the genome table and therefore assigned as NA. We can fix this with an easy step:

grass2$genome.inherited[is.na(grass2$genome.inherited)] <- 0.5
grass2
#   line cube.root.height genome.inherited
# 1 high             4.13              1.0
# 2 high             5.36              1.0
# 3 high             5.08              1.0
# 4 high             5.59              1.0
# 5 junk             4.85              0.5
# 6  low             4.37              0.0

@42-'s answer has the advantage of providing a default (nomatch) value in the initial call.

Upvotes: 0

Naveen
Naveen

Reputation: 1210

You dont have to create a new column with NA. Here is the code which does it for you.

grass$genome_inherited_values <- ifelse(grass$line == 'high', 1,
                  ifelse(grass$line == 'low', 0,
                         ifelse(grass$line == 'bl',0.25,
                                ifelse(grass$line == 'bh',0.75,0.5)

Upvotes: 0

Related Questions