user1412
user1412

Reputation: 729

Attach Labels to numeric codes of a data frame in R - efficient use of factors

I have a data which is in excel and I read that to store it in a data frame by the name "data". Similarly I the value labels for each variable and its code in a separate sheet by the name map. I read this sheet as well and store it is data frame by the name "map".

Link to the file is - https://www.wetransfer.com/downloads/bf0c5bfa88be20e4037d7fdc828ca66320161018075428/7f82a4

Below is the code to read the sheets-

library("readxl")
data <- read_excel("data_v1.xlsx",sheet = "data")
map <- read_excel("data_v1.xlsx",sheet = "map")

If you will notice the cells with less number of codes are with NA.

Now I want to attach the value labels to each of the codes from the map sheet to the data. I was doing a search and I realized that one of the ways to do this is by using factors. Where we define the levels and labels. I can use the original variable name in map file for the levels and for Labels I can use those with "_desc" as suffix.

Can some one please suggest an efficient way of doing this? I mean instead of writing codes for individual variables if we can do this in a loop? In the data file that i am attaching this is just a sample and the original data file would have more than 100 variables for which one would need to do this task of attaching labels.

Lastly, how can I attach variable labels? I mean,

Q1 should have the label "Like the packaging of the brand" Q2 should have the label "Like the taste of the brand" Q3 should have the label "Like the smell of the brand" Q4 should have the label "value for money"

Should I think of creating a separate sheet with one column having the variable name and next one having the labels for the variables? How would we attach these labels to the variables as factors only works for values I believe.

Finally I need to generate Table / cross tables where these Labels should get displayed. Charts where these value + Variable labels should get displayed.

Thank you!!

Prasad

Upvotes: 1

Views: 1289

Answers (2)

user7311064
user7311064

Reputation:

I would write a for loop to do this repetitive task. One would need to check if the variable names in data and map are same. Your data is not available, however I feel below should work.

for (i in names(data)[which(is.element(names(data),names(map)))]){
  data[[i]] <- factor(data[[i]],
                                 levels = na.omit(map[[i]]),
                                 labels = na.omit(map[[paste0(i,"_desc")]]))
}

na.oimt is to omit the rows that would have NAs.

Upvotes: 0

roman
roman

Reputation: 1370

conceptually you need something like this

city_labels <- map$City_desc[match(data$City, map$City)]

where you go from there... is a matter of judgement. how repetitive is the task? do certain things vary, can you know them in advance etc. if you have 100 to do and in each case there is a corresponding variable in map with same name and _desc as a suffix then I would write a function that assumes that, like this:

getLabels <- function(var_name) {
  map[,paste0(var_name, "_desc")][match(data[,var_name], map[,var_name])]
}

getLabels("City")

once you have fixed the problem in my comment above, you can do this:

new_dat <- data.frame(lapply(names(data), getLabels))
names(new_dat) <- names(data)
new_dat

and then you have your translated dataset - store this somewhere safe :-)

Upvotes: 1

Related Questions