Reputation: 330
The data are from a large survey in a set of developing countries. The data include, among other things, variables on each respondent's country and local region (within the country).
The only problem is that, instead of coding local region as strings (such as "New York" or "Westchester County", it is coded as numeric, which correspond to a list of regions in the codebook.
What I would like to know is whether there's a way to automate the process of re-naming the factors using the codelist from the codebook. Each region is preceded by a numeric value and an equals sign, and is followed immediately by a comma.
This list takes this form:
1=New York, 2=Paris, 3=London, 4=Moscow, 5=Boston, ..., 230=Tblisi
Is there some R code that might allow me to quickly rename all the factors in this variable using this list?
Upvotes: 1
Views: 363
Reputation: 34601
You could use strsplit
on the codelist and then use the result as the levels and labels for your factor.
citylist <- c("1=New York", "2=Paris", "3=London", "4=Moscow", "5=Boston")
codes <- data.frame(do.call(rbind, strsplit(citylist, "="))) # Split and bind the result into a dataframe
set.seed(85)
mycities <- ceiling(runif(10, 0, 5)) # Generate some dummy data
mycities <- factor(mycities, levels = codes$X1, labels = codes$X2)
Which gives:
[1] London New York Paris Moscow London Boston New York New York New York
[10] Boston
Levels: New York Paris London Moscow Boston
Upvotes: 1
Reputation: 1950
If you have a text file with a vector like
1=New York, 2=Paris, 3=London, 4=Moscow, 5=Boston, ..., 230=Tblisi
you're going to have to do some regex to extract the cities from the numbers. For example, you could do:
library(stringr)
List <- c("1=New York", "2=Paris", "3=London", "4=Moscow", "5=Boston")
Cities <- data.frame(Orig = List)
Cities$CityNum <- str_extract(Cities$Orig, "[0-9]{1,}") # match the number at least once
Cities$City <- str_sub(Cities$Orig,
start = str_locate(Cities$Orig, "[A-Z]")[, 1],
end = str_length(Cities$Orig))
Assuming that you have a column in MyData called "CityNum" that lists the number...
MyData <- merge(MyData, Cities, by = CityNum)
And I must agree with jbaums about being concise. :-)
Upvotes: 2