mgc77
mgc77

Reputation: 113

Filter a Table based on another Table

One includes the TMC # for all the roadways of interest. The second includes travel times occurring on every single roadway in a particular state. I want to use the first table to filter so that only the records corresponding to those roadways of interest are remaining.

df

  id     link       tmc
1  1 23402444 122P06466
2  2 23402487 122P06476
3  3 23402488 122N06476
4  4 23402493 122N06477
5  5 23402555 122P06454
6  6 23402557 122N06453

df2

  id       tmc   epoch  tt
1  1 108N04625 1182014 163
2  2 108N04625 1182014 103
3  3 108N04625 1182014  73
4  4 108N04625 1172014 254
5  5 108N04625 1172014 224

I was trying to use a filter

Data2Filter<-(Data2, TMC==Data2$TMC)

but I was getting either object not found for everything or that my dimensions are mismatched (there are about 8000 records in Data1 and 14000000 in Data 2 because there can be multiple travel times (TT) on a TMC but I'm only interested in those that occurred on TMCs corresponding to the list in Data1). I'm very familiar with MatLab but unfortunately at the moment I only have R Studio available and know nothing about this software. Also these table are loaded in as cvs files if that makes any difference.

Upvotes: 3

Views: 5708

Answers (1)

Paulo E. Cardoso
Paulo E. Cardoso

Reputation: 5856

df2$tmc[1] <- df$tmc[1]
df2

  id       tmc   epoch  tt
1  1 122P06466 1182014 163
2  2 108N04625 1182014 103
3  3 108N04625 1182014  73
4  4 108N04625 1172014 254
5  5 108N04625 1172014 224

Many many options

subset(df2, tmc %in% df$tmc)

df2[df2$tmc %in% df$tmc, ]

library(dplyr)
fi <- filter(df2, tmc %in% df$tmc)

fi
  id       tmc   epoch  tt
1  1 122P06466 1182014 163

superfast with large datasets

library(data.table)
dt <- data.table(df)
dt2 <- data.table(df2)
subset(dt2, tmc %in% dt$tmc)

This may be a useful topic on subset performance

Upvotes: 3

Related Questions