Samuel Schuetz
Samuel Schuetz

Reputation: 11

for loop generated list of dictionaries not converting to a data frame well

blah blah blah blah blah This sucks big problem I would really appreciate some help with this. I am extracting headings and associated list of words under each heading from a website. I have ended up with a list of dictionaries with a value list for each dictionary key:

from bs4 import BeautifulSoup
import pandas as pd
import os.path
listscape_1 = ['iPhone SE 2nd generation',
'iPhone 12 mini',
'iPhone 12',
'iPhone 12 Pro',
'iPhone 12 Pro Max',
'iPhone 13 mini',
'iPhone 13',
'iPhone 13 Pro',
'iPhone 13 Pro Max',
'iPhone SE 3rd generation',
'iPhone 14',
'iPhone 14 Plus',
'iPhone 14 Pro',
'iPhone 14 Pro Max']
storage_capacities= ['+64gb', '+128gb', '+256gb', '+512gb']
listscape = []
for i in listscape_1:
    j=i.replace(' ', '+')
    listscape.append(j)
finallist=[]    
for j in listscape:
    for i in storage_capacities: 
        ji= j+i
        finallist.append(ji)


for searchterm in finallist:
    data = []
    def get_data (searchterm):
        url = f'https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2334524.m570.l1311&_nkw={searchterm}&_sacat=0&LH_TitleDesc=0&LH_Auction=1&LH_FS=1&rt=nc_osacat=0&LH_ItemCondition=3000&LH_Complete=1&LH_Sold=1'
        r=requests.get(url)
        soup= BeautifulSoup(r.text, 'html.parser')
        return soup

    def parse(soup):
        products_list = []
        results = soup.find_all('div',{'class':'s-item__info clearfix'})
        for item in results:   
            products = {
                'title': item.find('div', {'class': 's-item__title'}).text, 
                 'soldprice': float(item.find('span', {'class':'s-item__price'}).text.replace('$','').replace(',','')),
                'solddate': item.select_one('span', {'class':'POSITIVE'}).text.replace('Sold ', ''),
                #'_______':'____________________________________________________________________'
                #'link': item.find('a', {'class': 's-item__link'})['href']
                        }
            products_list.append(products)
        return products_list

    


    soup = get_data(searchterm)
    products_list=parse(soup)
    #print(products_list)
    df=pd.DataFrame(products_list)
    df

the product_list contains a list of lists of dicts that does not convert right to a dataframe

I tried the above code and did not get a dataframe

Upvotes: 0

Views: 36

Answers (1)

T. Hall
T. Hall

Reputation: 302

Without running the searches on ebay myself

Your for loop in the last section is redefining the dataframe df each time it passes through. This means that if the last search doesn't have a result, you would get an empty dataframe.

You're also redefining your get_data and parse functions each time you pass through the loop, which is unnecessary. Try moving those function definitions out of the for loop, and build up your list of results to pass in one final call to pd.DataFrame instead and see if that helps:

# build list of search terms
finallist = ...

# define functions
def get_data(searchterm): ...
def parse(soup): ...

# build up list of results
results = []
for searchterm in finallist:
    soup = get_data(searchterm)
    products_list=parse(soup)

    # Important to extend here not append 
    # as you want to pass a single list to the final dataframe constructor
    results.extend(products_list)

df = pd.DataFrame(results)

Perhaps try that and show what result you get back?

Edit: as an extra, you can definitely simplify the construction of search times with a list comprehension and itertools from the standard library:

import itertools
finallist = [phone.replace(' ', '+') + storage for phone, storage in itertools.product(listscape_1, storage_capacities)]

Upvotes: 1

Related Questions