Reputation: 73
I am trying to analyze a large, sloppy, poorly coded data file about library reference interactions. here's a set of data that captures what I am struggling to do:
# assemble data
record<-c(2883823,2883824,2883825,2883826,2883828,2884074,2884076,2884660,2885106,2885222,2885703,2885709)
desk<-c("RRSS","RRSS","RRSS","RRSS","RRSS","RRSS","RRSS","Virt","RRSS","Virt","Virt","RRSS")
inperson<-c("InPerson<5Minutes",NA,NA,"InPerson<5Minutes",NA,NA,"InPerson<5Minutes",NA,"InPerson5-15Minutes",NA,NA,"InPerson15-30minutes")
phone<-c(NA,"Phone5-15Minutes","Phone<5Minutes",NA,NA,"Phone<5Minutes",NA,NA,NA,NA,NA,NA)
chat<-c(NA,NA,NA,NA,"Chat<5Minutes",NA,NA,"Chat5-15Minutes",NA,"Chat5-15Minutes","Chat15-30minutes",NA)
reference<-data.frame(record,desk,inperson,phone,chat) #create data frame
I'd like to code the different levels within variables inperson, phone, and chat changing from (perhaps with new names for clarity, I've used prefix Num below to indicate this) string to numeric. I think this would be some sort of if-then statements (but because the language used in the input data was coded with different language for each variable, each is different):
record desk Numperson Numphone Numchat
2883823 RRSS 1 0 0
2883824 RRSS 0 2 0
2883825 RRSS 0 1 0
2883826 RRSS 1 0 0
2883828 RRSS 0 0 1
2884074 RRSS 0 1 0
2884076 RRSS 1 0 0
2884660 Virt 0 0 2
2885106 RRSS 2 0 0
2885222 Virt 0 0 2
2885703 Virt 0 0 3
2885709 RRSS 3 0 0
and then rearrange it so that it more amenable to analyses, as follows:
record desk type Numlevel
2883823 RRSS person 1
2883824 RRSS phone 2
2883825 RRSS phone 1
2883826 RRSS person 1
2883828 RRSS chat 1
2884074 RRSS phone 1
2884076 RRSS person 1
2884660 Virt chat 2
2885106 RRSS person 2
2885222 Virt chat 2
2885703 Virt chat 3
2885709 RRSS person 3
any help, or pointers to where I should be looking, as a beginner, for the answers would be appreciated.
Upvotes: 1
Views: 77
Reputation: 132576
Maybe like this:
#clean up
reference$inperson <- gsub("InPerson|[Mm]inutes", "", reference$inperson)
reference$phone <- gsub("Phone|[Mm]inutes", "", reference$phone)
reference$chat <- gsub("Chat|[Mm]inutes", "", reference$chat)
#reshape to long format
library(reshape2)
reference <- melt(reference, id.vars = c("record", "desk"),
variable.name = "type", value.name = "Numlevel",
na.rm = TRUE)
#match
reference$Numlevel <- match(reference$Numlevel, c("<5", "5-15", "15-30"))
# record desk type Numlevel
#1 2883823 RRSS inperson 1
#4 2883826 RRSS inperson 1
#7 2884076 RRSS inperson 1
#9 2885106 RRSS inperson 2
#12 2885709 RRSS inperson 3
#14 2883824 RRSS phone 2
#15 2883825 RRSS phone 1
#18 2884074 RRSS phone 1
#29 2883828 RRSS chat 1
#32 2884660 Virt chat 2
#34 2885222 Virt chat 2
#35 2885703 Virt chat 3
Upvotes: 3