Scott Bellefeuille
Scott Bellefeuille

Reputation: 3

NameError: Name 'Item' is not defined

took the advice and i was able to pass the original error, thank you all so much so far :) i'm almost where i want to be. seems i still have a massive knowledge gap when it comes to indenting. you guys are truely a gem to the coding community, thank you so much so far :)

Here is the current code that has passed those errors and its down to a warning, and not extracting anything.

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://dc.urbanturf.com/pipeline'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

pipeline_items = soup.find_all('div', attrs={'class': 'pipeline-item'})
rows = []
columns = ['Listing Title', 'Listing url', 'listing image url', 'location', 'Project type', 'Status', 'Size']
for item in pipeline_items:
    # title, image url, listing url
    listing_title = item.a['title']
    listing_url = item.a['href']
    listing_image_url = item.a.img['src']
    for p_tag in item.find_all('p'):
        if not p_tag.h2:
            if p_tag.text == 'Location:':
                p_tag.span.extract()
                property_location = p_tag.text.strip()
            elif p_tag.span.text == 'Project type:':
                p_tag.span.extract()
                property_type = p_tag.text.strip()
            elif p_tag.span.text == 'Status:':
                p_tag.span.extract()
                property_status = p_tag.text.strip()
            elif p_tag.span.text == 'Size:':
                p_tag.span.extract()
                property_size = p_tag.text.strip()
  
    row = [listing_title, listing_url, listing_image_url, property_location, property_type, property_status, property_size]
    rows.append(row)
    df = pd.Dataframe(rows, columns=columns)
    df.to_excel('DC Pipeline Properties.xlsx', index=False)
print('File Saved')

the error that i get is the following im using pycharm 2020.2 maybe its a bad choice?

row = [listing_title, listing_url, listing_image_url, property_location, property_type, property_status, property_size] NameError: name 'property_location' is not defined

Upvotes: 0

Views: 1717

Answers (4)

Scott Bellefeuille
Scott Bellefeuille

Reputation: 3

Mission Accomplished thanks to everyone here, Cheers! few things i was missing. 1 Indenting for sure. 2 i was missing a span on the first subsection -- if p_tag.span.text == 'Location:': 3 i was missing a package openpyxl which was called at the bottom to write to excel.

100% working code below, and my promise to get better and help out when i can :)

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://dc.urbanturf.com/pipeline'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

pipeline_items = soup.find_all('div', attrs={'class': 'pipeline-item'})

rows = []
columns = ['listing title', 'listing url', 'listing image url', 'location', 'Project type', 'Status', 'Size']

for item in pipeline_items:
    # title, image url, listing url
    listing_title = item.a['title']
    listing_url = item.a['href']
    listing_image_url = item.a.img['src']

    for p_tag in item.find_all('p'):
        if not p_tag.h2:
            if p_tag.span.text == 'Location:':
                p_tag.span.extract()
                property_location = p_tag.text.strip()
            elif p_tag.span.text == 'Project type:':
                p_tag.span.extract()
                property_type = p_tag.text.strip()
            elif p_tag.span.text == 'Status:':
                p_tag.span.extract()
                property_status = p_tag.text.strip()
            elif p_tag.span.text == 'Size:':
                p_tag.span.extract()
                property_size = p_tag.text.strip()

    row = [listing_title, listing_url, listing_image_url, property_location, property_type, property_status, property_size]
    rows.append(row)
df = pd.DataFrame(rows, columns=columns)
df.to_excel('DC Pipeline Properties.xlsx', index=False)
print('File Saved')

Upvotes: 0

Kate
Kate

Reputation: 36

Line 17 and below needs to be inside the for loop for 'item' to be seen.

for item in pipeline_items:
    # title, image url, listing url
        listing_title = item.a['title']
        listing_url = item.a['href']
        listing_image_url = item.a.img['src']
for p_tag in item.find_all('p'):   <------------Indent this for loop to be inside the previous for loop.
    if not p_tag.h2:
        if p_tag.text == 'Location:':

Upvotes: 0

Nathan
Nathan

Reputation: 3648

The problem is that

pipeline_items = soup.find_all('div', attrs={'class': 'pipline-item'}) 

returns an empty list. The result of this is that:

for item in pipeline_items:

Never actually happens. Because of this the value of item is never defined.

I'm not sure exactly what you're trying to do. But I see two solutions:

  1. Indent for p_tag in item.find_all('p'): so that you execute it for every item. This way, if there are no items, it's not called (I think this is what you intended to do originally?)
  2. Add an if statement before the loop to check if item exists, and skip the loop if it doesn't. Which most closely copy what you're code is currently doing, but I don't think that's what you want it to do.

Upvotes: 1

dreyus95
dreyus95

Reputation: 164

Seems to me that your second for loop for p_tag in item.find_all('p'): is outside of the scope of the 1st for loop that iterates over items... Add that to the fact there might be 0 items in 1st loop, you get a None.

Just put the for loop and its content inside the for loop that iterates over items in pipeline_items.

Upvotes: 1

Related Questions