Jason Clark
Jason Clark

Reputation: 1391

Replacing vector values in R based on a list (hash)

I have a dataframe, one column of which is names. In a later phase of analysis, I will need to merge with other data by this name column, and there are a few names which vary by source. I'd like to clean up my names using a hash (map) of names->cleaned names. I've found several references to using R lists as hashes (e.g., this question on SE), but I can't figure out how to extract values for keys in a vector only as they occur. So for example,

> players=data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8))
> xref = c("Bob"="Robert", "Fred Jr." = "Fred")
> players$names
[1] Joe  John Bob 
Levels: Bob Joe John

Whereas players$names gives a vector of names from the original frame, I need the same vector, only with any values that occur in xref replaced with their equivalent (lookup) values; my desired result is the vector Joe John Robert.

The closest I've come is:

> players$names %in% names(xref)
[1] FALSE FALSE  TRUE

Which correctly indicates that only "Bob" in players$names exists in the "keys" (names) of xref, but I can't figure out how to extract the value for that name and combine it with the other names in the vector that don't belong to xref as needed.

note: in case it's not completely clear, I'm pretty new to R, so if I'm approaching this in the wrong fashion, I'm happy to be corrected, but my core issue is essentially as stated: I need to clean up some incoming data within R by replacing some incoming values with known replacements and keeping all other values; further, the map of original->replacement should be stored as data (like xref), not as code.

Upvotes: 4

Views: 2134

Answers (3)

Damian
Damian

Reputation: 1433

Updated answer: ifelse

ifelse is an even more straightforward solution, in the case that xref is a named vector and not a list.

players <- data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8), stringsAsFactors = FALSE)
xref <- c("Bob" = "Robert", "Fred Jr." = "Fred")

players$clean <- ifelse(is.na(xref[players$names]), players$names, xref[players$names])

players

Result

   names scores  clean
1   Joe    9.8    Joe
2  John    9.9   John
3   Bob    8.8 Robert

Previous answer: sapply

If xref is a list, then sapply function can be used to do conditional look-ups

players <- data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8))

xref <- list("Bob" = "Robert", "Fred Jr." = "Fred")

players$clean <- sapply(players$names, function(x) ifelse( x %in% names(xref), xref[x], as.vector(x)) )

players

Result

> players
  names scores  clean
1   Joe    9.8    Joe
2  John    9.9   John
3   Bob    8.8 Robert

Upvotes: 5

kdauria
kdauria

Reputation: 6711

Another example of replacing the factor levels.

allnames = levels(players$names)
levels(players$names)[ !is.na(xref[allnames]) ] = na.omit(xref[allnames])
players
#    names scores
# 1    Joe    9.8
# 2   John    9.9
# 3 Robert    8.8

If you get into really big data sets, you might take a look at merge function or the data.table package. Here is a data.table example of a join.

library(data.table)
players=data.table(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8), key="names")
nms = data.table(names=names(xref),names2=xref, key="names")
out = nms[players]
out[is.na(names2),names2:=names]
out
# names names2 scores
# 1:   Bob Robert    8.8
# 2:   Joe    Joe    9.8
# 3:  John   John    9.9

Here is an similar example with the merge function.

players=data.frame(names=c("Joe", "John", "Bob"), scores=c(9.8, 9.9, 8.8))
nms = data.frame(names=names(xref),names2=xref,row.names=NULL)
merge(nms,players,all.y=TRUE)
# names names2 scores
# 1   Bob Robert    8.8
# 2   Joe   <NA>    9.8
# 3  John   <NA>    9.9

Upvotes: 1

Matthew Lundberg
Matthew Lundberg

Reputation: 42679

You can replace the factor levels with the desired text. Here's an example which loops through xref and does the replacement:

for (n in names(xref)) {
  levels(players$names)[levels(players$names) == n ] <- xref[n]
}

players
##    names scores
## 1    Joe    9.8
## 2   John    9.9
## 3 Robert    8.8

Upvotes: 2

Related Questions