Reputation: 77
I have a data frame with several variables like this:
land_unit<-c("0.5ha", "hactares", "ha", "ha", "acre", "3ha",
"lima", "limas", "acre", "cunny", "6 cunnies")
I want to write a function that will tidy this data for me as i have many variables in my data frame with a similar format. I would like the function to replace each element based on the first letter that appears in the string. For example if the first letter to appear in the string is "h" I want the whole string replaced by "ha", if "l" then "lima", if "a" then "acre" and if "c" then "kani".
I have searched widely on this but cannot find an answer, however I am aware that there must be a relatively simple solution. Perhaps using regex?
Any help would be greatly appreciated.
Upvotes: 2
Views: 51
Reputation: 23101
This should also work (with making the lookup table hard-coded and decoupling the data from code):
land_unit<-c("0.5ha", "hactares", "ha", "ha", "acre", "3ha",
"lima", "limas", "acre", "cunny", "6 cunnies")
library(stringr)
# define a lookup table, decouple the data
lookup_table <- data.frame(first.letter=c('h', 'l', 'a', 'c'),
replace.str=c('ha', 'lima', 'acre', 'kani'),
stringsAsFactors = FALSE)
# extract the matches
matches <- match(str_match(land_unit, "[^[:alpha:]]*([:alpha:]).*")[,2] , lookup_table[,1])
# replace from lookup table
ifelse(!is.na(matches), lookup_table[matches,2], land_unit)
# [1] "ha" "ha" "ha" "ha" "acre" "ha" "lima" "lima" "acre" "kani" "kani"
Upvotes: 1
Reputation: 887028
Based on the description, may be this helps. We use gsubfn
to match zero or more characters that are not a letter ([^A-Za-z]*
) from the start of the string (^
) followed by a single letter captured as a group (([a-z])
) followed by other characters (.*
) and replace the capture group by a named key/value list
library(gsubfn)
gsubfn("^[^A-Za-z]*([a-z]).*", list(h = "ha", l="lima", a = "acre", c = "kani"), land_unit)
#[1] "ha" "ha" "ha" "ha" "acre" "ha" "lima" "lima" "acre" "kani" "kani"
Upvotes: 1