Reputation: 845
I am trying to automate a search which returns a table of information. I am able to print the results in .text but my question is how can I pass the results into a Pandas dataframe. The reason why I am asking is two fold; because I would want to print the results into a CSV file and I need the results in Pandas to do data analysis later on. Appreciate if anyone could help. My code as below:
import time
from selenium import webdriver
import pandas as pd
search = ['0501020210597400','0501020210597500','0501020210597600']
df = pd.DataFrame(search)
chrome_path = [Chrome Path]
driver = webdriver.Chrome(chrome_path)
driver.get('https://enquiry.mpsj.gov.my/v2/service/cuk_search/')
x = 0
while x <(len(df.index)):
search_box = driver.find_element_by_name('sel_value')
new_line = (df[0][x]).format(x)
search_box.send_keys(new_line)
search_box.submit()
time.sleep(5)
table = driver.find_elements_by_class_name('tr-body')
for data in table:
print(data.text)
driver.find_element_by_name('sel_value').clear()
x +=1
driver.close()
Upvotes: 1
Views: 2052
Reputation: 84465
You can use requests and do a POST to get the info rather than use selenium
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
search = ['0501020210597400','0501020210597500','0501020210597600']
headers = {'Referer' : 'https://enquiry.mpsj.gov.my/v2/service/cuk_search/1',
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
}
output = []
dfHeaders = ['No.', 'No. Akaun', 'Nama Di Bil', 'Jumlah Perlu Dibayar', '']
with requests.Session() as s:
for item in search:
r = s.get('https://enquiry.mpsj.gov.my/v2/service/cuk_search/1', headers = headers)
soup = bs(r.content, 'lxml')
key = soup.select_one('[name=ACCESS_KEY]')['value']
body = {'sel_input': 'no_akaun', 'sel_value': item, 'ACCESS_KEY': key}
res = s.post('https://enquiry.mpsj.gov.my/v2/service/cuk_search_submit/', data = body)
soup = bs(res.content, 'lxml')
table = soup.select_one('.tbl-list')
rows = table.select('.tr-body')
for row in rows:
cols = row.find_all('td')
cols = [item.text.strip() for item in cols]
output.append([item for item in cols if item])
df = pd.DataFrame(output, columns = dfHeaders)
print(df)
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False )
Upvotes: 1
Reputation: 22952
To load a CSV file to a DataFrame, you can do:
df = pd.read_csv('example.csv')
See the online doc: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv
To write the data to CSV, consult this article: Pandas writing dataframe to CSV file on SO.
The solution is:
df.to_csv(file_name, sep='\t')
Upvotes: 1