user1205632
user1205632

Reputation: 43

Rearranging parsed HTML data in Python

I have little experience programming so please excuse my ignorance.

I'm trying to parse the 'Key Statistics' page from Yahoo! Finance, to be specific this page. I've been fooling around with BeautifulSoup and was able to extract the data I wanted but have since ran into a mental block. I would like the data to appear like this:

measure[i]: value[i]
.
.  
measure[n]: value[n]

but the results I'm getting with my script is:

measure[i]  
.
.    
measure[n]  
value[i]
.
.
value[n]

Here is my attempt of joining the two data fields together which throws an error:

measure = soup.findAll('td', {'class':'yfnc_tablehead1'}, width='74%')  
value = soup.findAll('td', {'class':'yfnc_tabledata1'}) 

for incident in measure:
    x = incident.contents

for incident2 in value:
    y = incident2.contents

data = x + y

print ': '.join(data)

Moreover there are unwanted characters in these values I would like to remove but I will read up on the re.compile and re.sub documentation.

Thank you for any input.

Upvotes: 1

Views: 185

Answers (2)

Misha Akovantsev
Misha Akovantsev

Reputation: 1825

measures = ['1', '2', '3', '4']
values = ['a', 'b', 'c', 'd']

for pair in zip(measures, values):
    print ': '.join(pair)

# 1: a
# 2: b
# 3: c
# 4: d

About zip :

Type:       builtin_function_or_method
Base Class: <type 'builtin_function_or_method'>
String Form:<built-in function zip>
Namespace:  Python builtin
Docstring:
zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

Return a list of tuples, where each tuple contains the i-th element
from each of the argument sequences.  The returned list is truncated
in length to the length of the shortest argument sequence.

Upvotes: 0

yurib
yurib

Reputation: 8147

data = x + y

the + operator appends lists, if you want to couple corresponding items of the lists try the zip() function:

data = zip(x,y)
for m,v in data:
  print m,v

also,

for incident in measure:
  x = incident.contents

this overwrites x in every iteration of the loop so in the end x contains only the last value assigned and not the aggregate of them all. here you probably do want to use the + operator like so:

for incident in measure:
  x += incident.contents # x += y is the same as x = x + y

of course the same goes for the other loop.

Upvotes: 2

Related Questions