Reputation: 23
I have a data frame called "grass". One of the information in this data frame is "Line" which can be: high, low, f1, f2, bl or bh.
I created a new column and want to add information to this column as the following code shows.
The problem is that I get "1" for all, not just for "high"
#add new column
grass["genome.inherited"] <- NA
#adding information to genome.inherited
#1 for the high-tolerance parent genotype (high)
#0 for the low-tolerance parent genotype (low)
#0.5 for the F1 and F2 hybrids (f1) (f2)
#0.25 for the backcross to the low tolerance population (bl)
#0.75 for the backcross to the high tolerance population (bh)
#how I tried to solve the problem
grass$genome.inherited <- if(grass$line == 'high'){
1
} else if(grass$line == 'low'){
0
} else if(grass$line == 'bl'){
0.25
} else if(grass$line == 'bh'){
0.75
} else {
0.5
}
As suggested here is the output for head(grass)
line cube.root.height genome.inherited
high 4.13 1
high 5.36 1
high 4.37 1
high 5.08 1
high 4.85 1
high 5.59 1
Thank you!
Upvotes: 2
Views: 53
Reputation: 263352
How about using the match
function. It gives a number that indicates the position of a value in a character vector and has an "nomatch" value as well.
grass$genome.inherited <- c(1, 0, 0.25, 0.75, 0.5)[
match( grass$line, c( 'high', 'low','bl','bh'), nomatch=5) ]
Example from console with other values of line to test:
grass <- read.table(text="line cube.root.height genome.inherited
high 4.13 1
high 5.36 2
low 4.37 1
high 5.08 1
junk 4.85 1
high 5.59 1
", head=T)
grass$genome.inherited <- c(1, 0, 0.25, 0.75, 0.5)[
match( grass$line, c( 'high', 'low','bl','bh'), nomatch=5) ]
grass
#----
line cube.root.height genome.inherited
1 high 4.13 1.0
2 high 5.36 1.0
3 low 4.37 0.0
4 high 5.08 1.0
5 junk 4.85 0.5
6 high 5.59 1.0
Upvotes: 1
Reputation: 11140
Your if
conditions have length > 1. When the condition has length > 1 only the first element will be used and that's why you are getting all 1
s.
Here's a different (simpler than nested ifelse
) approach for the same thing -
vals <- c(high = 1, low = 0, f1 = 0.5, f2 = 0.5, bl = 0.25, bh = 0.75)
grass$genome.inherited <- vals[as.character(grass$line)]
Upvotes: 0
Reputation: 160447
I agree (with 42-) that nested ifelse
statements is not preferred. @42-'s solution of match
is (imo) far better than ifelse
s.
An alternative is to merge
them.
Data:
grass <- read.table(text="line cube.root.height
high 4.13
high 5.36
low 4.37
high 5.08
junk 4.85
high 5.59
", head=TRUE, stringsAsFactors=FALSE)
The table of values to merge in:
genome <- data.frame(
line=c("high","low","bl","bh"),
genome.inherited=c(1, 0, 0.25, 0.75),
stringsAsFactors=FALSE)
The merge:
grass2 <- merge(grass, genome, by="line", all.x=TRUE)
If you look at the data, you'll see an NA
, because "junk"
(an unknown value) is not present in the genome
table and therefore assigned as NA
. We can fix this with an easy step:
grass2$genome.inherited[is.na(grass2$genome.inherited)] <- 0.5
grass2
# line cube.root.height genome.inherited
# 1 high 4.13 1.0
# 2 high 5.36 1.0
# 3 high 5.08 1.0
# 4 high 5.59 1.0
# 5 junk 4.85 0.5
# 6 low 4.37 0.0
@42-'s answer has the advantage of providing a default (nomatch
) value in the initial call.
Upvotes: 0
Reputation: 1210
You dont have to create a new column with NA. Here is the code which does it for you.
grass$genome_inherited_values <- ifelse(grass$line == 'high', 1,
ifelse(grass$line == 'low', 0,
ifelse(grass$line == 'bl',0.25,
ifelse(grass$line == 'bh',0.75,0.5)
Upvotes: 0