Reputation: 2995
I have one file named traffic
that looks like this:
city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
-1 6 23 1 1 47 1082010 8 50 1 150
0 6 25 1 1 73 1042010 4 0 1 2200
0 6 652 1 4 71 3282010 28 1 3 1901
1 6 1289 1 2 71 7152010 15 40 7 2140
4 6 1289 1 3 71 7152010 15 40 7 2140
1 6 1289 1 4 71 7152010 15 40 7 2140
and a set of separate files that tell me what the numerical code in each column refers to. For example, I have a file called city
that looks like this:
Code Definition
-1 Blank
0 NA
1 ACAMPO
2 ACTON
3 ADELANTO
4 ADIN
How do I use the codes in the city
file to replace the coded values in traffic
? The output would look like this:
city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
Blank 6 23 1 1 47 1082010 8 50 1 150
NA 6 25 1 1 73 1042010 4 0 1 2200
NA 6 652 1 4 71 3282010 28 1 3 1901
ACAMPO 6 1289 1 2 71 7152010 15 40 7 2140
ADIN 6 1289 1 3 71 7152010 15 40 7 2140
ACAMPO 6 1289 1 4 71 7152010 15 40 7 2140
All the solutions I've seen using recode or likewise involve explicitly stating which value corresponds to which as in the cars
packages example: recode(x, "c(1,2)='A'; else='B'")
What I'd like to do, instead, is to have the strings in city$Definition
replace the numerical codes in traffic$city
if city$Code
matches traffic$city
.
I could do traffic<-merge(traffic, city, by.x = "city", by.y = "Code")
and then traffic$city<-traffic$Definition
and then traffic$Definition<-NULL
, but it just seems like this would be a common enough operation that there would be a convenient function for doing this.
Bonus points for a solution which allows me to specify multiple columns to be replaced by values from multiple files without repeating myself too much.
Upvotes: 0
Views: 159
Reputation: 179448
Perhaps the easiest way is to rename the columns in your lookup tables so that the merge operation just "works":
names(city) <- c("city", "City Name")
merge(traffic, city)
city statenum casenum vnumber pnumber county accdate
1 -1 6 23 1 1 47 1082010
2 0 6 25 1 1 73 1042010
3 0 6 652 1 4 71 3282010
4 1 6 1289 1 2 71 7152010
5 1 6 1289 1 4 71 7152010
6 4 6 1289 1 3 71 7152010
accday accmin accmon acctime City Name
1 8 50 1 150 Blank
2 4 0 1 2200 <NA>
3 28 1 3 1901 <NA>
4 15 40 7 2140 ACAMPO
5 15 40 7 2140 ACAMPO
6 15 40 7 2140 ADIN
Since this is the structure that one would expect in relational databases, this should make it easy if you then wish use sqldf
or data.table
.
Upvotes: 2
Reputation: 7475
this maybe what you want
traffic<-read.table(header=T,text="city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
-1 6 23 1 1 47 1082010 8 50 1 150
0 6 25 1 1 73 1042010 4 0 1 2200
0 6 652 1 4 71 3282010 28 1 3 1901
1 6 1289 1 2 71 7152010 15 40 7 2140
4 6 1289 1 3 71 7152010 15 40 7 2140
1 6 1289 1 4 71 7152010 15 40 7 2140")
city<-read.table(header=T,text="Code Definition
-1 Blank
0 NA
1 ACAMPO
2 ACTON
3 ADELANTO
4 ADIN")
traffic$city<-city$Definition[match(traffic$city,city$Code)]
but I may have mistaken your meaning
or much more fun
library(sqldf)
sqldf("SELECT c.Definition,t.statenum,t.casenum,t.vnumber,t.pnumber,t.county,t.accdate,t.accday,t.accmin,t.accmon from traffic t, city c where t.city=c.Code")
I would advocate sqldf and SQL type SELECTS as maybe answering your last part. I cant comment on how it performs with large dataframes however.
EDIT:
I would like to have SELECT c.Definition as city.....
here but it throws an error
Upvotes: 3