Reputation: 23
Disclaimer: I'm learning to develop in Python and I know that way of coding is probably like trash but I plan to keep improving while creating programs.
So I'm trying to build a scraper to check for specific flights prices daily with Selenium and that part of the code is already done. Origin, destination, first flight date, second flight date and price will be saved every day. I'm saving those data into a file and then comparing if there were any changes in price.
My aim is to make if there is change in price by more than an X percentage and then to print a message into the script for every compared flight.
import pandas as pd
import os.path
import numpy as np
#This are just sample data before integrating Selenium values
price = 230
departuredate = '20/02/2020'
returndate = '20/02/2020'
fromm = 'BOS'
to = 'JFK'
price2 = 630
departuredate2 = '20/02/2020'
returndate2 = '20/02/2020'
fromm2= 'CDG'
to2= 'JFK'
#End of sample data
flightdata = {'From': [fromm, fromm2], 'To': [to,to2], 'Departure date': [departuredate,departuredate2], 'Return date': [returndate,returndate2], 'Price': [price,price2]}
df = pd.DataFrame(flightdata, columns= ['From', 'To', 'Departure date', 'Return date', 'Price'])
#Check if the script is running for the first time
if os.path.exists('flightstoday.xls') == True:
os.remove("flightsyesterday.xls")
os.rename('flightstoday.xls', 'flightsyesterday.xls') #Rename the flights scraped fromm yesterday
df.to_csv('flightstoday.xls', mode='a', header=True, sep='\t')
else:
df.to_csv('flightstoday.xls', mode='w', header=True, sep='\t')
#Work with two dataframes
flightsyesterday = pd.read_csv("flightsyesterday.xls",sep='\t')
flightstoday = pd.read_csv("flightstoday.xls",sep='\t')
What I'm missing is how to compare the column 'Price' and print a message saying that for the row X with 'From', 'To', 'Departure date', 'Return date' the flight has changed by an X percentage.
I have tried this code but it only adds a column to flighstoday file but not the percentage and of course doesn't print there was any change in price.
flightstoday['PriceDiff'] = np.where(vueloshoy['Price'] == vuelosayer['Price'], 0, vueloshoy['Price'] - vuelosayer['Price'])
Any help for this newbie will be greatly appreciated. Thank you!
Upvotes: 2
Views: 619
Reputation: 879
From what I've gathered, I think this is what you're intending to do.
import pandas as pd
import os.path
import numpy as np
# This are just sample data before integrating Selenium values
price = 230
departuredate = '20/02/2020'
returndate = '20/02/2020'
fromm = 'BOS'
to = 'JFK'
price2 = 630
departuredate2 = '20/02/2020'
returndate2 = '20/02/2020'
fromm2 = 'CDG'
to2 = 'JFK'
# Create second set of prices
price3 = 250
price4 = 600
# Generate data to construct DataFrames
today_flightdata = {'From': [fromm, fromm2], 'To': [to, to2], 'Departure date': [
departuredate, departuredate2], 'Return date': [returndate, returndate2], 'Price': [price, price2]}
yesterday_flightdata = {'From': [fromm, fromm2], 'To': [to, to2], 'Departure date': [
departuredate, departuredate2], 'Return date': [returndate, returndate2], 'Price': [price3, price4]}
# Create dataframes for yesterday and today
today = pd.DataFrame(today_flightdata, columns=[
'From', 'To', 'Departure date', 'Return date', 'Price'])
yesterday = pd.DataFrame(yesterday_flightdata, columns=[
'From', 'To', 'Departure date', 'Return date', 'Price'])
# Determine changes
today['price_change'] = (
today['Price'] - yesterday['Price']) / yesterday['Price'] * 100.
# Determine indices of all rows where price_change > threshold
threshold = 1.0
today['exceeds_threshold'] = abs(today['price_change']) >= threshold
exceed_indices = today['exceeds_threshold'][today['exceeds_threshold']].index
# Print out those entries that exceed threshold
for idx in exceed_indices:
row = today.iloc[idx]
print('Flight from {} to {} leaving on {} and returning on {} has changed by {}%'.format(
row['From'], row['To'], row['Departure date'], row['Return date'], row['price_change']))
Output:
Flight from CDG to JFK leaving on 20/02/2020 and returning on 20/02/2020 has changed by 5.0%
I learned the syntax to calculate exceed_indices
from this post
Upvotes: 1