Reputation: 2763
Looks like a pretty simple problem, but I haven't been able to find any solution so far.
Consider the following data frame:
dat <- data.frame(id=LETTERS[1:5],
land.use=c(3,4,9,34,39))
I need to replace the numbers in the land.use
column with strings. The problem is: I have distinct strings for the numbers 3
, 4
and 34
.
However, R insists in replacing 34
with the concatenated strings for 3
and 4
.
For example:
dat$land.use <- gsub("3","Bare soil", dat$land.use)
dat$land.use <- gsub("4","Primary Forest", dat$land.use)
dat$land.use <- gsub("9","Secondary Forest", dat$land.use)
dat$land.use <- gsub("34","Wheat", dat$land.use)
dat$land.use <- gsub("39","Soybean", dat$land.use)
> dat
id land.use
1 A Bare soil # This is OK
2 B Primary Forest # This is OK
3 C Secondary Forest # This is OK
4 D Bare soilPrimary Forest # This should be Wheat
5 E Bare soilSecondary Forest # This should be Soybean
What am I doing wrong?
Upvotes: 1
Views: 49
Reputation: 886948
We can use a left_join
library(dplyr)
left_join(df1, keydat, by = 'land.use')
keydat <- data.frame(land.use = c(3, 4, 9, 34, 39),
value = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))
Upvotes: 1
Reputation:
Depending on what you do next, it's also possible that you want a factor()
variable. You could do this, or use one of the other methods and use as.factor()
later.
dat$land.use.factor <- factor(dat$land.use,
levels = c(3, 4, 9, 34, 39),
labels = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))
# > dat
# id land.use land.use.factor
# 1 A 3 Bare soil
# 2 B 4 Primary Forest
# 3 C 9 Secondary Forest
# 4 D 34 Wheat
# 5 E 39 Soybean
Upvotes: 1
Reputation: 39647
In this case I would use match
to substitute the number with a string.
c("Bare soil","Primary Forest","Secondary Forest","Wheat",
"Soybean")[match(dat$land.use, c(3,4,9,34,39))]
#[1] "Bare soil" "Primary Forest" "Secondary Forest" "Wheat"
#[5] "Soybean"
To make it with your approach you have to add ^
and $
.
dat$land.use <- sub("^3$","Bare soil", dat$land.use)
dat$land.use <- sub("^4$","Primary Forest", dat$land.use)
dat$land.use <- sub("^9$","Secondary Forest", dat$land.use)
dat$land.use <- sub("^34$","Wheat", dat$land.use)
dat$land.use <- sub("^39$","Soybean", dat$land.use)
dat
# id land.use
#1 A Bare soil
#2 B Primary Forest
#3 C Secondary Forest
#4 D Wheat
#5 E Soybean
Upvotes: 1
Reputation: 388817
Don't use partial match functions (gsub
, grep
etc) when you want to perform an exact match. You can create a lookup table and perform a join.
lookup_table <- data.frame(land.use = c(3, 4, 9, 34, 39),
value = c("Bare soil", "Primary Forest",
"Secondary Forest", "Wheat", "Soybean"))
merge(dat, lookup_table, all.x = TRUE, by = 'land.use')
# land.use id value
#1 3 A Bare soil
#2 4 B Primary Forest
#3 9 C Secondary Forest
#4 34 D Wheat
#5 39 E Soybean
Upvotes: 2