Yasha
Yasha

Reputation: 330

Rename factors from list (in R)

My Data

The data are from a large survey in a set of developing countries. The data include, among other things, variables on each respondent's country and local region (within the country).

The only problem is that, instead of coding local region as strings (such as "New York" or "Westchester County", it is coded as numeric, which correspond to a list of regions in the codebook.

My Question

What I would like to know is whether there's a way to automate the process of re-naming the factors using the codelist from the codebook. Each region is preceded by a numeric value and an equals sign, and is followed immediately by a comma.

This list takes this form:

1=New York, 2=Paris, 3=London, 4=Moscow, 5=Boston, ..., 230=Tblisi

Is there some R code that might allow me to quickly rename all the factors in this variable using this list?

Upvotes: 1

Views: 363

Answers (2)

lroha
lroha

Reputation: 34601

You could use strsplit on the codelist and then use the result as the levels and labels for your factor.

citylist <- c("1=New York", "2=Paris", "3=London", "4=Moscow", "5=Boston")
codes <- data.frame(do.call(rbind, strsplit(citylist, "="))) # Split and bind the result into a dataframe

set.seed(85)
mycities <- ceiling(runif(10, 0, 5))     # Generate some dummy data
mycities <- factor(mycities, levels = codes$X1, labels = codes$X2)

Which gives:

[1] London   New York Paris    Moscow   London   Boston   New York New York New York
[10] Boston  
Levels: New York Paris London Moscow Boston

Upvotes: 1

shirewoman2
shirewoman2

Reputation: 1950

If you have a text file with a vector like

 1=New York, 2=Paris, 3=London, 4=Moscow, 5=Boston, ..., 230=Tblisi

you're going to have to do some regex to extract the cities from the numbers. For example, you could do:

 library(stringr)
 List <- c("1=New York", "2=Paris", "3=London", "4=Moscow", "5=Boston")
 Cities <- data.frame(Orig = List)
 Cities$CityNum <- str_extract(Cities$Orig, "[0-9]{1,}") # match the number at least once
 Cities$City <- str_sub(Cities$Orig, 
                   start = str_locate(Cities$Orig, "[A-Z]")[, 1],
                   end = str_length(Cities$Orig))

Assuming that you have a column in MyData called "CityNum" that lists the number...

 MyData <- merge(MyData, Cities, by = CityNum)

And I must agree with jbaums about being concise. :-)

Upvotes: 2

Related Questions