Reputation: 29
What I'm trying to achieve is shorten the amount of time needed to complete scraping process and store all the data in a dictionary (the dictionary is Untiters
keys are usernames, values are the amount of times user made a post with a specific name) I used this site as a tutorial but I couldn't figure out how to implement what's explained there on my code. Here is the code, sorry if I provided an unnecessarily big portion of the code.
from multiprocessing import Pool
import requests
from bs4 import BeautifulSoup
z = 0
Untitleds = ["Sin título","Untitled","Sans titre","İsimsiz","Ohne Titel","بلا عنوان",
"Без названия","无标题","夕イトルなし"]
Untiters = {}
Untits = []
x = 138
for i in range(1,20):
y = x + 1
x = y
Id = y
link = "https://folioscope.co/blank/" + str(Id)
Url = (link)
R = requests.get(Url)
Soup = BeautifulSoup(R.text,"html5lib")
Pretitle = (Soup.find("div",{"class":"container_padding"}))
Title = Pretitle.div.text
if Title in (Untitleds):
Prename = Soup.find("div",{"class":"padding_bottom_normal"})
Name = Prename.a.text
Untitled = z + 1
z = Untitled
if Name not in Untiters:
Untiters.update({Name : 1})
else:
c0 = Untiters[Name]
c1 = c0 + 1
Untiters[Name] = c1
Untits.append(Title)
print (Title, Name)
Upvotes: 1
Views: 236
Reputation: 195408
To use multiprocessing.Pool
to get data from the site, you can use following example:
from multiprocessing import Pool
import requests
from bs4 import BeautifulSoup
def get_data(id_):
url = "https://folioscope.co/blank/" + str(id_)
soup = BeautifulSoup(requests.get(url).content, "html.parser")
title = soup.select_one("#animation_container .title") or ""
if title:
title = title.text
username = soup.select_one(".username") or ""
if username:
username = username.text
return id_, title, username
if __name__ == "__main__":
with Pool() as pool:
for id_, title, username in pool.imap_unordered(
get_data, range(138, 158)
):
if title and username:
print("{:<4} {:<40} {}".format(id_, title, username))
# here you can add the result to list, filter duplicates etc.
Prints:
153 First attempt CyberAly
149 Minecraft Loop MisterD
142 An Idea! Pyro
148 Untitled szymun
152 Thunder dpknyk1993
139 Untitled WoopDeDoo
146 Untitled szymun
144 Loop pjrd
138 Blink fairyfina
140 Test sknob
154 Dragon Ball kameha piedicmolkok
157 Boom animation33
156 Tree in wind CyberAly
Upvotes: 1