Scraping plain text from webpage in php

Question

I try most of the regular expression. but they are not working for me .. i need the regular expression that remove all html tags and return value ....in my html file there are following html tags are :input text, select.

          $file_string = file_get_contents('page_to_scrape.html');

          preg_match('/(.*)<\/title>/i', $file_string, $title);
          $title_out = $title[1];

          preg_match('/<option value="ELIT">(.*)<\/option>/i', $file_string,   $keywords);
          $keywords_out = $keywords[1];

          preg_match('/<option value="MAS" selected="selected">(.*)<\/option>/i', $file_string, $ash);
          $ash_s = $ash[1];

         preg_match('/<input type="text" value="(.*)"/>/i', $file_string, $description);
         $description_out = $description[1];

         preg_match_all('/<li><a href="(.*)">(.*)<\/a><\/li>/i', $file_string, $links);

        ?>

         <p><strong>Title:</strong> <?php echo $title_out; ?></p>
          <p><strong>Name:</strong> <?php echo $keywords_out; ?></p>
      <p><strong>TExtbox:</strong> <?php echo $description_out; ?></p>
     <p><strong>Event:</strong> <?php echo $ash_s; ?></p>
          <p><strong>Links:</strong> <em>(Name - Link)</em><br />
     <?php
            echo '<ol>';
           for($i = 0; $i < count($links[1]); $i++) {
              echo '<li>' . $links[2][$i] . ' - ' . $links[1][$i] . '</li>';
     }
       echo '</ol>';
        ?>
      </p>
</code></pre>

<p>Html file
                         
                         
                         
                            This is the Title 
                             
                             
                         
                         
                        
                            <li>Link 1</li>
                            <li>Link 2</li>
                            <li>Link 3</li>
                            <li>Link 4</li>
                            <li>Link 5</li>   </p>

<pre><code>                    </ul>
                    <div class="field">
                                <label>Event:</label>
                                <select name="event" class="event">
                                                            <option value="MAS" selected="selected">Same</option>
                                                                <option value="ELIT">Same4</option>
                                                                <option value="IPC">Same3</option>
                                                                <option value="VLMW">Same2</option>
                                                    </select>
                            </div>

                            <div class="field">
                                                            <label class="sub">Surname:</label>
                                                <input name="search[name]" value="Smith" type="text">
                                                <br>
                                                                        <label class="sub">First Name:</label>
                                                <input name="search[firstname]" value="Alex" type="text">
                                                <br>



                            </div>


                    </div> 
                    </body> 
                    </html> 
</code></pre>

Scraping plain text from webpage in php

Answers (1)

Related Questions