victorsionado
victorsionado

Reputation: 99

Obtaining a single line from a variable result in Python

I am a self-learner in Python and i'm right now exploring web-scraping and things like that. I have been working with Tumblr pictures, they work oddly, since they have several links in the same sentence, but I have been able to get one link per line, but I just want to get one link.

I guessed this would returned me something like:

source (blog_name)
link (the link which ends with 400w)

But no, I won't get any results from this. If I take off the if statement, I would get something like this:

source (blog_name)
link (75w)
link (100w)
link (250w)
link (400w)
link (400w)

Here is the code:

import requests
from bs4 import BeautifulSoup


posts_scrape = requests.get('tumblr.com/search/thingtosearch')
soup = BeautifulSoup(posts_scrape.text, 'html.parser')

articles = soup.find_all('article', class_='_2DpMA')



def getdata(url):
    r = requests.get(url)
    return r.text



for article in articles:
    
    try:
        post_notes = article.find('span', class_='_22VV4').text
        
        if 'notes' in post_notes:
            source = article.find('div', class_='_3QBiZ').text
                     
                for imgvar in article.find_all('img', alt='Image'):
                    
                    url_results = imgvar['srcset']

                    r_urls = url_results.replace(',','\n')
                    for line in r_urls:
                        if line.find("400w"):
                            print(source)
                            print(r_urls)
                            break
   
    except AttributeError:
        continue

Upvotes: 0

Views: 84

Answers (1)

Ajay Singh Rana
Ajay Singh Rana

Reputation: 573

So, You have kind of messed up your code and it uses a lot many for loops than required.

The reason your code doesn't work as expected is cause you are using the if conditional to check line.find('400w') which will either return an index or -1 no matter which index it returns other than index 0 the if conditional is always going to evalueate to True.

And secondly your line and r_urls variables are strings which contain all urls and one of those urls is always going to end in 400w so it will always evaluate to True as it can never be at the index 0 and as it is a string when you output print(r_urls) it will print urls altogether as these were in the same string.

The following is somewhat cleaner way to do what you want:

import requests
from bs4 import BeautifulSoup

search_term = 'dog'
posts_scrape = requests.get('https://www.tumblr.com/search/search_term')
soup = BeautifulSoup(posts_scrape.text, 'html.parser')

articles = soup.find_all('article', class_='_2DpMA')

for article in articles:
    try:
        source = article.find('div', class_='_3QBiZ').text
        urls = []
        for imgvar_avatar in article.find_all('img', alt='Image'):
            url_list = [i for i in imgvar_avatar['srcset'].split(',') if (i.find('400w') != -1)]
            urls.extend(url_list)
        print(f'{source} : {urls}')
    except AttributeError:
        continue

This outputs in the following format: blog name: [list of links for all the 'Images' in it with width 400w] Sample output for search_term = 'dog':

everythingfox : []
liriusworldfaws : []
cuteness--overload : [' https://64.media.tumblr.com/bdbfcdf3fc0462eb3be656f0c8085792/e47c10ace1710c69-dc/s400x600/41420a481f8150b866eab574e56cc43e6d8181ef.jpg 400w']
everythingfox : []
delta-breezes : [' https://64.media.tumblr.com/6ed0b95f72eb90a88dd15cb546d913c8/1a24f512409f7700-c6/s400x600/456a6e3ac2f073fba90dbe885c5868063f3a1f39.jpg 400w']
fluffygif : []
scampthecorgi : [' https://64.media.tumblr.com/05c8f0b3345906fc7e6c04282cee9382/4b68a8516b31bce5-6c/s400x600/bd689d26dcb0d39e1c9c2aec494247da425f5f25.jpg 400w']
sirartwork : [' https://64.media.tumblr.com/7329e508e44714f33f90bd69a26fb08e/d998f5e61b3dfe95-b5/s400x600/1c40fbb26161cfac31e9fe72c89bc4305dc9820e.jpg 400w']
k-ayo : [' https://64.media.tumblr.com/36580aa2e20ce45761d4d76f0c9a502d/044c64a380aaef8e-74/s400x600/bdaa0c14a96c8d4bf5cd61620e0d7384a1c13b05.jpg 400w', ' https://64.media.tumblr.com/a42440c97a294c4f53b8a0747d5a009e/044c64a380aaef8e-b3/s400x600/d49ff2b41c44846e002ed96d45e1adfcd59b2753.jpg 400w', ' https://64.media.tumblr.com/44f559aaf1babb250e650cfc3fa94070/044c64a380aaef8e-95/s400x600/abf3c7e3e6ba0de41258a1e7424d961dcb19616b.jpg 400w', ' https://64.media.tumblr.com/44a2c5351564ab917e114672daa737ab/044c64a380aaef8e-8e/s400x600/181e0316dce57706532cbf1becc3509725fc7683.jpg 400w']
pugsandfrenchbulldogs : [' https://64.media.tumblr.com/c09c92db68fd0beee2ede0cffff896c2/658837adc5e2db43-7f/s400x600/ffe111d3133966be7c05507a6f75cf08d10afc59.jpg 400w']
cuteness--overload : []
everythingfox : []
fascinic : [' https://64.media.tumblr.com/55cebf41bb3c979fdc68d4c09e32d96c/9a6d6375418d7a14-0b/s400x600/5df22c0d01f39932b883585fcafc80944dd489c5.jpg 400w']
cuteness--overload : [' https://64.media.tumblr.com/dc6b9d9955244eefc8e6de0690f970d3/e6517da006b766fc-6a/s400x600/9a0a24db65fb8265e2b42d049f58ca376997743d.jpg 400w']
cuteness--overload : [' https://64.media.tumblr.com/2fcf4bfdf94bc7725fc1064cc5fb37bb/4a5db62ba4bbd86e-12/s400x600/61a9826f9eff0c2de1984c7f5fe0ef535560eece.jpg 400w']
catasters : []
male--wife : [' https://64.media.tumblr.com/b849ab48e135972b5a0566491bdcea93/1f844f181a482794-a1/s400x600/cc6605131bd901a0246e61448f7a9caca8112be9.png 400w', ' https://64.media.tumblr.com/97ec5949947d8aed0cec995c0a74e3c3/1f844f181a482794-65/s400x600/cc58a0efcbc621f7033e71166d310779faa2b400.png 400w', ' https://64.media.tumblr.com/099b77b5f560330b48757febabdfb314/1f844f181a482794-45/s400x600/1370ae9dc74d2f6e30a376703a8a0d09277cc5a8.png 400w', ' https://64.media.tumblr.com/322b69a71f34171cc05906b446b2615a/1f844f181a482794-49/s400x600/549108985e37b0c823be0b7438e1c7de834a9b2e.png 400w', ' https://64.media.tumblr.com/bfb082cdd44c2545e58424ec704cdb2c/1f844f181a482794-1b/s400x600/d5f1ba9af3aa3e930e329ac230980b73f6882faa.png 400w', ' https://64.media.tumblr.com/a210dbea09d5bfbb60c0d01b3d07825d/1f844f181a482794-c1/s400x600/b9d380228956b7493415b08fac6f2f7e22a1e484.png 400w', ' https://64.media.tumblr.com/2d904ff086e5fee6282aaa02d99f7045/1f844f181a482794-78/s400x600/82d26a959fa1f79072673c5a3a7ff52df93d574d.png 400w', ' https://64.media.tumblr.com/c478911a2ea835549fc3c94018086233/1f844f181a482794-46/s400x600/f307f706a186febd4c02723de297dbcd3524f1a4.png 400w', ' https://64.media.tumblr.com/2e091b16060114ab6a501556f1992be2/1f844f181a482794-71/s400x600/45ace560d7ef390d041fa4883c64a656280ebd66.png 400w', ' https://64.media.tumblr.com/6bf67260c9286b800af99c7d15ccfa42/1f844f181a482794-0d/s400x600/b5496b038d4c27824df53df6fa00e183d6fa9bd3.png 400w']
liriusworldfaws : []
puppy-esso : [' https://64.media.tumblr.com/825ca92b45b1810f9182ca9631bf1560/7ebd600d44780133-1f/s400x600/14ea5af4c9e149cb88863dfc0117f10a62c82e99.jpg 400w']
hitmewithcute : [' https://64.media.tumblr.com/db5993448e5d8bb8228e0f1b81142e7a/dcefa14e006d548b-77/s400x600/f4d1dadd5b6a49417c8d15573972775e9e707a53.jpg 400w']

Upvotes: 2

Related Questions