William Gunn
William Gunn

Reputation: 2995

How do I use values from one dataframe to recode values in another?

I have one file named traffic that looks like this:

city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
-1       6      23       1       1     47 1082010      8     50      1     150
0        6      25       1       1     73 1042010      4      0      1    2200
0        6     652       1       4     71 3282010     28      1      3    1901
1        6    1289       1       2     71 7152010     15     40      7    2140
4        6    1289       1       3     71 7152010     15     40      7    2140
1        6    1289       1       4     71 7152010     15     40      7    2140

and a set of separate files that tell me what the numerical code in each column refers to. For example, I have a file called city that looks like this:

Code     Definition
-1       Blank
0        NA
1        ACAMPO
2        ACTON
3        ADELANTO
4        ADIN

How do I use the codes in the city file to replace the coded values in traffic? The output would look like this:

city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
Blank     6      23       1       1     47 1082010      8     50      1     150
NA        6      25       1       1     73 1042010      4      0      1    2200
NA        6     652       1       4     71 3282010     28      1      3    1901
ACAMPO    6    1289       1       2     71 7152010     15     40      7    2140
ADIN      6    1289       1       3     71 7152010     15     40      7    2140
ACAMPO    6    1289       1       4     71 7152010     15     40      7    2140

All the solutions I've seen using recode or likewise involve explicitly stating which value corresponds to which as in the cars packages example: recode(x, "c(1,2)='A'; else='B'") What I'd like to do, instead, is to have the strings in city$Definition replace the numerical codes in traffic$city if city$Code matches traffic$city.

I could do traffic<-merge(traffic, city, by.x = "city", by.y = "Code") and then traffic$city<-traffic$Definition and then traffic$Definition<-NULL, but it just seems like this would be a common enough operation that there would be a convenient function for doing this.

Bonus points for a solution which allows me to specify multiple columns to be replaced by values from multiple files without repeating myself too much.

Upvotes: 0

Views: 159

Answers (2)

Andrie
Andrie

Reputation: 179448

Perhaps the easiest way is to rename the columns in your lookup tables so that the merge operation just "works":

names(city) <- c("city", "City Name")
merge(traffic, city)

  city statenum casenum vnumber pnumber county accdate
1   -1        6      23       1       1     47 1082010
2    0        6      25       1       1     73 1042010
3    0        6     652       1       4     71 3282010
4    1        6    1289       1       2     71 7152010
5    1        6    1289       1       4     71 7152010
6    4        6    1289       1       3     71 7152010
  accday accmin accmon acctime City Name
1      8     50      1     150     Blank
2      4      0      1    2200      <NA>
3     28      1      3    1901      <NA>
4     15     40      7    2140    ACAMPO
5     15     40      7    2140    ACAMPO
6     15     40      7    2140      ADIN

Since this is the structure that one would expect in relational databases, this should make it easy if you then wish use sqldf or data.table.

Upvotes: 2

shhhhimhuntingrabbits
shhhhimhuntingrabbits

Reputation: 7475

this maybe what you want

traffic<-read.table(header=T,text="city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
-1       6      23       1       1     47 1082010      8     50      1     150
0        6      25       1       1     73 1042010      4      0      1    2200
0        6     652       1       4     71 3282010     28      1      3    1901
1        6    1289       1       2     71 7152010     15     40      7    2140
4        6    1289       1       3     71 7152010     15     40      7    2140
1        6    1289       1       4     71 7152010     15     40      7    2140")

city<-read.table(header=T,text="Code     Definition
-1       Blank
0        NA
1        ACAMPO
2        ACTON
3        ADELANTO
4        ADIN")

traffic$city<-city$Definition[match(traffic$city,city$Code)]

but I may have mistaken your meaning

or much more fun

library(sqldf)
sqldf("SELECT c.Definition,t.statenum,t.casenum,t.vnumber,t.pnumber,t.county,t.accdate,t.accday,t.accmin,t.accmon from traffic t, city c where t.city=c.Code")

I would advocate sqldf and SQL type SELECTS as maybe answering your last part. I cant comment on how it performs with large dataframes however.

EDIT: I would like to have SELECT c.Definition as city..... here but it throws an error

Upvotes: 3

Related Questions