Daniel Prinsloo
Daniel Prinsloo

Reputation: 149

Writing from one file to another python

I am trying to take some information I got from a webpage and write one of the variables to a file however I am having no luck it is probably very easy but I'm lost. Here is an example of one of the rows there are 1253 rows.

<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">

I am after the field called data-name it is not at the same spot in each row. I tried this but it did not work

mfile=open('itemlist.txt','r')
mfile2=open('output.txt','a')
for row in mfile:
    if char =='data-name':
        mfile2.write(char)

Edit 1:

I made an example file of 'hello hi peanut' if did:

for row in mfile:
    print row.index('hello')

it would print 0 as expected however when I changed the hello to hi it didnt return 1 it returned nothing.

Upvotes: 1

Views: 138

Answers (2)

poke
poke

Reputation: 388413

Let’s try to find the value using common string manipulation methods:

>>> line = '''<div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'''

We can use str.index to find the position of a string within a string:

>>> line.index('data-name')
87

So now we know we need to start looking at index 87 for the attribute we are interested in:

>>> line[87:]
'data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'

Now, we need to remove the data-name=" part too:

>>> start = line.index('data-name') + len('data-name="')
>>> start
98
>>> line[start:]
'Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">'

Now, we just need to find the index of the closing quotation mark too, and then we can extract just the attribute value:

>>> end = line.index('"', start)
>>> end
118
>>> line[start:end]
'Kill-a-Watt Allbrero'

And then we have our solution:

start = line.index('data-name') + len('data-name="')
end = line.index('"', start)
print(line[start:end])

We can put that in the loop:

with open('itemlist.txt','r') as mfile, open('output.txt','a') as mfile2w
    for line in mfile:
        start = line.index('data-name') + len('data-name="')
        end = line.index('"', start)
        mfile2.write(line[start:end])
        mfile2.write('\n')

Upvotes: 3

Sait
Sait

Reputation: 19855

You can also use beautifulsoup:

a.html:

<html>
    <head>
        <title> Asdf </title>
    </head>
    <body>

        <div class='entry qual-5 used-demoman slot-head bestprice custom' data-price='3280000' data-name="Kill-a-Watt Allbrero" data-quality="5" data-australium="normal" data-class="demoman" data-particle_effect="56" data-paint="" data-slot="cosmetic" data-consignment="consignment">

    </body>
</html>

a.py:

from bs4 import BeautifulSoup
with open('a.html') as f:
    lines = f.readlines()
soup = BeautifulSoup(''.join(lines), 'html.parser')
result = soup.findAll('div')[0]['data-price']
print result
# prints 3280000

My opinion is, if your task is pretty easy as in your example, there is actually no need of using beautifulsoup. However, if it is more complicated, or it will be more complicated. Consider giving it a try with beautifulsoup.

Upvotes: 1

Related Questions