thiagoveloso
thiagoveloso

Reputation: 2763

R - Avoid concatenation when replacing string by number

Looks like a pretty simple problem, but I haven't been able to find any solution so far.

Consider the following data frame:

dat <- data.frame(id=LETTERS[1:5],
                  land.use=c(3,4,9,34,39))

I need to replace the numbers in the land.use column with strings. The problem is: I have distinct strings for the numbers 3, 4 and 34.

However, R insists in replacing 34 with the concatenated strings for 3 and 4.

For example:

dat$land.use <- gsub("3","Bare soil", dat$land.use)
dat$land.use <- gsub("4","Primary Forest", dat$land.use)
dat$land.use <- gsub("9","Secondary Forest", dat$land.use)
dat$land.use <- gsub("34","Wheat", dat$land.use)
dat$land.use <- gsub("39","Soybean", dat$land.use)

> dat
  id                  land.use
1  A                 Bare soil # This is OK
2  B            Primary Forest # This is OK
3  C          Secondary Forest # This is OK
4  D   Bare soilPrimary Forest # This should be Wheat
5  E Bare soilSecondary Forest # This should be Soybean

What am I doing wrong?

Upvotes: 1

Views: 49

Answers (4)

akrun
akrun

Reputation: 886948

We can use a left_join

library(dplyr)
left_join(df1, keydat, by = 'land.use')

data

keydat <- data.frame(land.use = c(3, 4, 9, 34, 39), 
                           value = c("Bare soil", "Primary Forest", 
                           "Secondary Forest", "Wheat", "Soybean"))

Upvotes: 1

user10917479
user10917479

Reputation:

Depending on what you do next, it's also possible that you want a factor() variable. You could do this, or use one of the other methods and use as.factor() later.

dat$land.use.factor <- factor(dat$land.use, 
                              levels = c(3, 4, 9, 34, 39),
                              labels = c("Bare soil", "Primary Forest", 
                                         "Secondary Forest", "Wheat", "Soybean"))

# > dat
#    id land.use  land.use.factor
# 1   A        3        Bare soil
# 2   B        4   Primary Forest
# 3   C        9 Secondary Forest
# 4   D       34            Wheat
# 5   E       39          Soybean

Upvotes: 1

GKi
GKi

Reputation: 39647

In this case I would use match to substitute the number with a string.

c("Bare soil","Primary Forest","Secondary Forest","Wheat",
  "Soybean")[match(dat$land.use, c(3,4,9,34,39))]
#[1] "Bare soil"        "Primary Forest"   "Secondary Forest" "Wheat"           
#[5] "Soybean"         

To make it with your approach you have to add ^ and $.

dat$land.use <- sub("^3$","Bare soil", dat$land.use)
dat$land.use <- sub("^4$","Primary Forest", dat$land.use)
dat$land.use <- sub("^9$","Secondary Forest", dat$land.use)
dat$land.use <- sub("^34$","Wheat", dat$land.use)
dat$land.use <- sub("^39$","Soybean", dat$land.use)
dat
#  id         land.use
#1  A        Bare soil
#2  B   Primary Forest
#3  C Secondary Forest
#4  D            Wheat
#5  E          Soybean

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

Don't use partial match functions (gsub, grep etc) when you want to perform an exact match. You can create a lookup table and perform a join.

lookup_table <- data.frame(land.use = c(3, 4, 9, 34, 39), 
                           value = c("Bare soil", "Primary Forest", 
                           "Secondary Forest", "Wheat", "Soybean"))

merge(dat, lookup_table, all.x = TRUE, by = 'land.use')

#  land.use id            value
#1        3  A        Bare soil
#2        4  B   Primary Forest
#3        9  C Secondary Forest
#4       34  D            Wheat
#5       39  E          Soybean

Upvotes: 2

Related Questions