Beautiful soup and bottlenose, how to parse correctly

Question

I am currently trying to extract strings from the response of a bottlenose amazon api request. Without wanting to cause Russian hackers to pwn to my webapp, I am trying to use beautiful soup following this small webpage as guide.

My current code:

import bottlenose as BN
import lxml
from bs4 import BeautifulSoup

amazon = BN.Amazon('MyAmznID','MyAmznSK','MyAmznAssTag',Region='UK', Parser=BeautifulSoup)
rank = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")

soup = BeautifulSoup(rank)

print rank
print soup.find('SalesRank').string

This is the current output from bottlenose looks like this:

53f15ff4-3588-4e63-af6f-279bddc7c2430.0234130000000000TrueASIN0198596790SalesRankAll0198596790124435

So the bottle nose section works but the soup section gives an error response:

Traceback (most recent call last):
File "/Users/Fuck/Documents/Amazon/Bottlenose_amzn_prog/test.py", line 12, in 
print soup.find(Rank).string
NameError: name 'soup' is not defined

I am trying to extract the digits between the 'SalesRank' tags, but failing.

Brendan Quinn · Accepted Answer

From looking at the code, it seems that the Bottlenose Parser option is very simple and takes a function as the parameter.

So you can just make a very simple Python function and pass it to the constructor, which makes your code look like this:

import bottlenose as BN
from bs4 import BeautifulSoup

def parse_xml(text):
    return BeautifulSoup(text, 'xml')

amazon = BN.Amazon(
    AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,
    AWS_ASSOCIATE_TAG,Region='UK', Parser=parse_xml
)
results = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")

print results.find('SalesRank').string

Or you can use a lambda in-line function instead:

import bottlenose as BN
from bs4 import BeautifulSoup

amazon = BN.Amazon(
    AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_ASSOCIATE_TAG,
    Region='UK', Parser=lambda text: BeautifulSoup(text, 'xml')
)
results = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")

print results.find('SalesRank').string

Beautiful soup and bottlenose, how to parse correctly

Answers (2)

Related Questions