Need to scrape the data using BeautifulSoup

Question

I am in need to get the celebrity details from https://www.astrotheme.com/celestar/horoscope_celebrity_search_by_filters.php Input: Time of birth as known only, except the world events in a profession, where I get nearby 22,822 celebrities. I am able to get the first page data, using the urllib2 and bs4

import re
import urllib2
from bs4 import BeautifulSoup

url = "https://www.astrotheme.com/celestar/horoscope_celebrity_search_by_filters.php"
data = "sexe=M|F&categorie[0]=0|1|2|3|4|5|6|7|8|9|10|11|12&connue=1&pays=-1&tri=0&x=33&y=13"

fp = urllib2.urlopen(url, data)
soup = BeautifulSoup(fp, 'html.parser')
from_div = soup.find_all('div', attrs={'class': 'titreFiche'})

for major in from_div:
    name = re.findall(r'portrait">(.*?)
', str(major))
    link = re.findall(r'



For the next 230 pages, I am unable to get the data. I used to change the URL as page equal to until end but I can't scrape. Is there any way to get those remaining data from that page?

ewwink · Accepted Answer

you need session cookies, use requests to save session easily

from bs4 import BeautifulSoup
import requests, re

url = "https://www.astrotheme.com/celestar/horoscope_celebrity_search_by_filters.php"
searchData = {
  "sexe": "M|F",
  "categorie[0]": "0|1|2|3|4|5|6|7|8|9|10|11|12",
  "connue": 1, "pays": -1, "tri": 0, "x": 33, "y": 13
}
session = requests.session()

def doSearch(url, data=None):
  if data:
    fp = session.post(url, data=data).text
  else:
    fp = session.get(url).text
  soup = BeautifulSoup(fp, 'html.parser')
  from_div = soup.find_all('div', attrs={'class': 'titreFiche'})

  for major in from_div:
      name = re.findall(r'portrait">(.*?)
', str(major))
      link = re.findall(r'

Need to scrape the data using BeautifulSoup

Answers (1)

Related Questions