Reputation: 155
I once wrote a Python script does basically scrapes a webpage and searches for a particular text and gives the value of the number of times the text appears in the web page.
Now, I want to incorporate the same as a web app.
My app will be taking two, string variables; a date (which I will split into day, month, and year) and a name.
The first variable will be used to generate the unique web URL (it's a date-based list) and this URL will be parsed to collect information using text search (if possible by using RegEx beyond simple Python search functions).
Now, I want to set up a webpage which will have two elements (for date and a name). I want these two variables to be run by the script and then the output must be generated in the web page (on the same page or on a new page).
Simple.
With my limited knowledge, I think both Flask and Django will be too heavy for this.
How do you think would I be able do it?
EDIT: Here's my code (that I essentially thought out and grabbed from different places.
# KATscrape is a script that Basil Ajith (https://twitter.com/basilajith) wrote way back
# in 2016-2017 in order to search and parse the KAT
# cause lists. Now, it is being re-written to be hosted
# as a web application online.
# Parsing web page learnt from:
# https://stackoverflow.com/questions/25067580/passing-web-data-into-beautiful-soup-empty-list#25068054
# Developed by https://stackoverflow.com/users/2141635/padraic-cunningham
# Printing List without quotes learnt from:
# https://stackoverflow.com/questions/11178061/print-list-without-brackets-in-a-single-row#11178075
# Developed by https://stackoverflow.com/users/1172428/fatalerror
# Edited by https://stackoverflow.com/users/6451573/jean-fran%c3%a7ois-fabre
# Finding occurence of advocate's name in the cause list learnt from:
# https://stackoverflow.com/questions/17268958/finding-occurrences-of-a-word-in-a-string-in-python-3#17268979
# Developed by https://stackoverflow.com/users/148870/amber
# Dependencies
from sys import argv
from bs4 import BeautifulSoup
import requests
import re # I don't know pandas (neither do I know RegEx much); but I think RegEx would serve our purpose.
filename, date, adv_name = argv
# Short Lists
court_numbers = ["1", "7", "8", "4"]
# The parser function
def katscrape():
day = date[0:2]
month = date[3:5]
year = date[6:14]
base_url = "http://keralaadministrativetribunal.gov.in/ciskat/pages/cause_list_home.php?type=search&dte=%s/%s/%s&court=%s"
# Starting to parse
for i in court_numbers:
cl_current = base_url % (day, month, year, i)
the_page = requests.get(cl_current)
soup = BeautifulSoup(the_page.content, "lxml")
da_stuff = str(soup)
judges_list = ["Mr. Justice T.R. Ramachandran Nair",
"Mr. V. Somasundaran", "Mr. V.Rajendran", "Mr. Rajesh Dewan", "Mr. Benny Gervacis"]
sitting=[]
for x in judges_list:
if x in da_stuff:
sitting.append(x)
# Printing court number and presiding members.
print("Court No. %s:" % i)
print("Presiding: ", (", ".join(sitting)), " \n")
# Checking for advocate's name in the cause list:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(adv_name), da_stuff))
print("%s has %d matters in this court.\n" % (adv_name, count))
print("Matters for %s on %s:" % (adv_name, date) + "\n")
katscrape()
Upvotes: 0
Views: 2349
Reputation: 1128
not sure if you're still looking for an answer here but I want to give you a minimal working example
Imagine your project directory:
my_project
|-- app.py (your flask server)
|
|-- api
|-- services
`--my_script.py
my_script
import json
from datetime import datetime
def split_date(date_string):
dt = datetime.strptime(date_string, '%Y-%m-%d')
date_object = {
"year": dt.year,
"month": dt.month,
"day": dt.day
}
return date_object
app.py (The flask backend)
from flask import Flask, request, jsonify
from api.services.my_script import split_date
app = Flask(__name__)
@app.route('/api/get-date-object', methods=['GET', 'POST'])
def get_date_object():
if request.method == 'GET':
return jsonify({ "error": True, "msg": "Must perform POST request to this endpoint"})
if request.method == 'POST':
request_json = request.get_json()
if "date_string" in request_json:
print(request_json)
date_object = split_date(request_json["date_string"])
print(date_object)
return jsonify(date_object)
else:
return jsonify({ "error": True, "msg": "Must provide a date string in the format 'YYYY-MM-DD'."})
app.run(debug=True)
Now we test these api endpoints via curl. This is the step you'll want to do from the browser, either via plain HTML or a JS framework like Angular, React, or Vue.
Expecting error response, because we are performing a GET request:
λ curl -s -X GET http://localhost:5000/api/get-date-object
{
"error": true,
"msg": "Must perform POST request to this endpoint"
}
Expecting an error, because we are not passing a "date_string" key in the json object:
λ curl -s -X POST -H "Content-Type: application/json" -d "{\"name\": \"Graced Lamb\"}" http://localhost:5000/api/get-date-object
{
"error": true,
"msg": "Must provide a date string in the format 'YYYY-MM-DD'."
}
Expecting a success, and returning an object where we return individual components of a date provided as "YYYY-MM-DD":
λ curl -s -X POST -H "Content-Type: application/json" -d "{\"date_string\": \"2020-02-01\", \"name\": \"Graced Lamb\"}" http://localhost:5000/api/get-date-object
{
"day": 1,
"month": 2,
"year": 2020
}
I had to escape the individual keys within the -d parameter, because I am on Windows this time. I hope this helps. If you'd like the code for this, let me know and I can create a public github repo for you to clone/download.
Upvotes: 1