v_e
v_e

Reputation: 377

How to perform a conditional lookup in r?

I play with the following two simple datasets:

(myData <- data.frame(ID=c(1:7, 5), Sum=c(10, 20, 30, 40, 50, 60, 700, 200)))
#   ID Sum
# 1  1  10
# 2  2  20
# 3  3  30
# 4  4  40
# 5  5  50
# 6  6  60
# 7  7 700
# 8  5 200

and

(myMap <- data.frame(ID=c(1:5, 7), Name=c("a", "b", "c", "d", "e", "g")))
#   ID  Name
# 1  1    a
# 2  2    b
# 3  3    c
# 4  4    d
# 5  5    e
# 6  7    g

I will map the data with the map, this way:

myData$Name<-myMap$Name[match(myData$ID, myMap$ID)]

However since there is no map entry for the ID == 6, the output is:

  ID Sum Name
1  1  10    a
2  2  20    b
3  3  30    c
4  4  40    d
5  5  50    e
6  6  60 <NA>
7  7 700    g
8  5 200    e

What I am trying to do now: in the record where Name is NA, the Name should become ID. My attempts:

myData$Dummy<-ifelse( is.na(myData$Name),myData$ID, myData$Name)

or

for (i in 1:length(myData$Name) )
  if (is.na(myData$Name[i])) 
  {
    x <- myData$ID[i]
    # print(x)
    myData$Name[i]<- as.factor(x)
    print(myData$Name[i])
  }

are wrong. Could you please give me a hint?

Upvotes: 2

Views: 604

Answers (2)

akrun
akrun

Reputation: 887951

An option using data.table

library(data.table)#1.9.5+
setkey(setDT(myData), ID)[myMap, Name:=i.Name][is.na(Name),
              Name:= as.character(ID)]
#   ID Sum Name
#1:  1  10    a
#2:  2  20    b
#3:  3  30    c
#4:  4  40    d
#5:  5  50    e
#6:  5 200    e
#7:  6  60    6
#8:  7 700    g

NOTE: As commented by @Arun, in the devel version v1.9.5, we can also set the key as an argument inside setDT, i.e. setDT(myData, key='ID')

Upvotes: 2

hrbrmstr
hrbrmstr

Reputation: 78842

It's the fact that the column you think is character is really a factor. Either use stringsAsFactors=FALSE when creating the data frame or you'll need to account for it when manipulating the data. I've provided dplyr + piping and base R solutions below. Note the use of left_join (dplyr) or merge (base) vs your subset & matching:

library(dplyr)

myData <- read.csv(text="ID;Sum
1;10
2;20
3;30
4;40
5;50
6;60
7;700
5;200", sep=";")

myMap <- read.csv(text="ID;Name
1;a
2;b
3;c
4;d
5;e
7;g", sep=";")

# dplyr -------------------------------------------------------------------

myData %>%
  left_join(myMap) %>%
  mutate(Name=as.character(Name),
         Name=ifelse(is.na(Name), ID, Name)) -> dplyr_myData

## Joining by: "ID"

dplyr_myData

##   ID Sum Name
## 1  1  10    a
## 2  2  20    b
## 3  3  30    c
## 4  4  40    d
## 5  5  50    e
## 6  6  60    6
## 7  7 700    g
## 8  5 200    e

# base --------------------------------------------------------------------

base_myData <- merge(myData, myMap, all.x=TRUE)
base_myData$Name <- as.character(base_myData$Name)
base_myData$Name <- ifelse(is.na(base_myData$Name),
                           base_myData$ID, base_myData$Name)

base_myData

##   ID Sum Name
## 1  1  10    a
## 2  2  20    b
## 3  3  30    c
## 4  4  40    d
## 5  5  50    e
## 6  5 200    e
## 7  6  60    6
## 8  7 700    g

Upvotes: 5

Related Questions