Reputation: 55
Currently I'm splitting the titles of cars for sales using the split() function. I'm then itterating through a csv file, read by pandas to look for matching strings. Here is my current script:
title = 'ABARTH Abarth 500 124 1.4l 22,200miles UK'
keywords = title.split()
for word in keywords:
for index, row in df.iterrows():
makes = row[1]
models = row[3]
if word in makes:
print('Found: ' + word + ' as a match with ' + makes + ' in pd data set.')
pass
else:
print('no')
if word in models:
print('Found: ' + word + ' as a match with ' + models + ' in pd data set.')
pass
Once I successfully match the 'make' how can I match the appropriate 'model'?
Upvotes: 0
Views: 567
Reputation: 9711
str.contains will help to you to find the matching strings, then do need to apply filter on the data frame using DataFrame.loc. You just need to repeat these 2 functions for make & model as you need. Since the data you provided wasn't sufficient, I created a sample data and here is the code.
import pandas as pd
import numpy as np
import re
from io import StringIO
words = ['amc', 'buick','volkswagen']
Models = "|".join(map(re.escape, words))
words = ['European','American']
Origins = "|".join(map(re.escape, words))
TESTDATA=StringIO("""
Model,MPG,Cylinders,Engine Disp,Horsepower,Weight,Accelerate,Year,Origin
amc ambassador dpl,15.0,8,390,190,3850,8.5,70,American
amc gremlin,21.0,6,199,90,2648,15.0,70,American
amc hornet,18.0,6,199,97,2774,15.5,70,American
amc rebel sst,16.0,8,304,150,3433,12.0,70,American
buick estate wagon (sw),14.0,8,455,225,3086,10.0,70,American
buick skylark 320,15.0,8,350,165,3693,11.5,70,American
chevrolet chevelle malibu,18.0,8,307,130,3504,12.0,70,American
chevrolet impala,14.0,8,454,220,4354,9.0,70,American
volkswagen 1131 deluxe sedan,26.0,4,97,46,1835,20.5,70,European
fiat 124b,30.0,4,88,76,2065,14.5,71,European
opel 1900,28.0,4,116,90,2123,14.0,71,European
peugeot 304,30.0,4,79,70,2074,19.5,71,European
volkswagen model 111,27.0,4,97,60,1834,19.0,71,European """)
df = pd.read_csv(TESTDATA, sep=",")
df=df.loc[df.Model.str.contains(Models)]
df=df.loc[df.Origin.str.contains(Origins)]
print (df)
Output
Model MPG Cylinders Engine Disp Horsepower Weight Accelerate Year Origin
0 amc ambassador dpl 15.0 8 390 190 3850 8.5 70 American
1 amc gremlin 21.0 6 199 90 2648 15.0 70 American
2 amc hornet 18.0 6 199 97 2774 15.5 70 American
3 amc rebel sst 16.0 8 304 150 3433 12.0 70 American
4 buick estate wagon (sw) 14.0 8 455 225 3086 10.0 70 American
5 buick skylark 320 15.0 8 350 165 3693 11.5 70 American
8 volkswagen 1131 deluxe sedan 26.0 4 97 46 1835 20.5 70 European
12 volkswagen model 111 27.0 4 97 60 1834 19.0 71 European
Upvotes: 1