Reputation: 149
I am trying to take some information I got from a webpage and write one of the variables to a file however I am having no luck it is probably very easy but I'm lost. Here is an example of one of the rows there are 1253 rows.
<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">
I am after the field called data-name it is not at the same spot in each row. I tried this but it did not work
mfile=open('itemlist.txt','r')
mfile2=open('output.txt','a')
for row in mfile:
if char =='data-name':
mfile2.write(char)
Edit 1:
I made an example file of 'hello hi peanut' if did:
for row in mfile:
print row.index('hello')
it would print 0 as expected however when I changed the hello to hi it didnt return 1 it returned nothing.
Upvotes: 1
Views: 138
Reputation: 388413
Let’s try to find the value using common string manipulation methods:
>>> line = '''<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'''
We can use str.index
to find the position of a string within a string:
>>> line.index('data-name')
87
So now we know we need to start looking at index 87
for the attribute we are interested in:
>>> line[87:]
'data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'
Now, we need to remove the data-name="
part too:
>>> start = line.index('data-name') + len('data-name="')
>>> start
98
>>> line[start:]
'Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'
Now, we just need to find the index of the closing quotation mark too, and then we can extract just the attribute value:
>>> end = line.index('"', start)
>>> end
118
>>> line[start:end]
'Kill-a-Watt Allbrero'
And then we have our solution:
start = line.index('data-name') + len('data-name="')
end = line.index('"', start)
print(line[start:end])
We can put that in the loop:
with open('itemlist.txt','r') as mfile, open('output.txt','a') as mfile2w
for line in mfile:
start = line.index('data-name') + len('data-name="')
end = line.index('"', start)
mfile2.write(line[start:end])
mfile2.write('\n')
Upvotes: 3
Reputation: 19855
You can also use beautifulsoup:
a.html:
<html>
<head>
<title> Asdf </title>
</head>
<body>
<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">
</body>
</html>
a.py:
from bs4 import BeautifulSoup
with open('a.html') as f:
lines = f.readlines()
soup = BeautifulSoup(''.join(lines), 'html.parser')
result = soup.findAll('div')[0]['data-price']
print result
# prints 3280000
My opinion is, if your task is pretty easy as in your example, there is actually no need of using beautifulsoup
. However, if it is more complicated, or it will be more complicated. Consider giving it a try with beautifulsoup
.
Upvotes: 1