Merge DataFrames with ordering criteria

Question

In a previous question, I was asking how to match values from this DataFrame source:

     car_id     lat     lon
0    100        10.0    15.0
1    100        12.0    10.0
2    100        13.0    09.0
3    110        23.0    08.0
4    110        13.0    09.0
5    110        12.0    10.0
6    110        12.0    02.0
7    120        11.0    11.0
8    120        12.0    10.0
9    120        13.0    09.0
10   120        14.0    08.0
11   130        12.0    10.0

And keep only those whose coords are in this second DataFrame coords:

     lat     lon
0    12.0    10.0
1    13.0    09.0

But this time I'd like to match each car_id who gets:

all the values from coords
with the same order

So that the resulting DataFrame result would be:

     car_id
1    100
2    120

# 110 has all the values from coords, but not in the same order
# 130 doesn't have all the values from coords

Is there a way to achieve this result in a vectorized way, avoiding going through a lot of loops and conditionals?

piRSquared · Accepted Answer

plan

we will groupby 'car_id' and evaluate each subset
after an inner merge we should see two things
1. the resultant merged dataframe should have the same values as coords
2. the resultant merged dataframe should cover everything

def duper(df):
    m = df.merge(coords)
    c = pd.concat([m, coords])
    # we put the merged rows first and those are
    # the ones we'll keep after `drop_duplicates(keep='first')`
    # `keep='first'` is the default, so I don't pass it
    c1 = (c.drop_duplicates().values == coords.values).all()

    # if `keep=False` then I drop all duplicates.  If I got
    # everything in `coords` this should be empty
    c2 = c.drop_duplicates(keep=False).empty
    return c1 & c2

source.set_index('car_id').groupby(level=0).filter(duper).index.unique().values

array([100, 120])

slight alternative

def duper(df):
    m = df.drop('car_id', 1).merge(coords)
    c = pd.concat([m, coords])
    c1 = (c.drop_duplicates().values == coords.values).all()
    c2 = c.drop_duplicates(keep=False).empty
    return c1 & c2

source.groupby('car_id').filter(duper).car_id.unique()

Merge DataFrames with ordering criteria

Answers (2)

Related Questions