Reputation:
I'm trying to scrape the example sentences for a specific french word using python, but the page I get back into python doesn't seem to have any results.
I've inspected the element of the search box and search button and included them as parameters. Perhaps I'm missing something?
http://www.online-languages.info/french/examples.php
import requests
from bs4 import BeautifulSoup
word = 'manger'
url='http://www.online-languages.info/french/examples.php'
params ={'word':word,'go':''}
response=requests.post(url, data=params)
soup = BeautifulSoup(response.text, 'html5lib')
print(soup.prettify())
Edit: Here is the output of the result. It appears it may be using javascript. If that's the case, does anyone have a different library I could use?
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html dir="ltr" lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
French example sentences :: Online-languages.info
</title>
<meta content="text/css" http-equiv="Content-Style-Type"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="Database containing thousands of example sentences. Sentences are important for learning correct use of words." name="Description"/>
<meta content="French language. French grammar. French vocabulary. Tests. Language certificate. Verbs. French phrases. French pronunciation. E-learning. Conversation." name="Subject"/>
<meta content="French, French grammar, French dictionary, French vocabulary, French language, tests, French test, exam, fce, verbs, exercise, certificate, course, games" name="keywords"/>
<link href="../style.css" rel="stylesheet" type="text/css"/>
</head>
<body style="background-image:url(./img/bg2.jpg);">
<div align="center">
<table bgcolor="white" border="0" cellpadding="6" cellspacing="0" style="-moz-border-radius:20px;" width="1000">
<tbody>
<tr>
<td align="center" colspan="4">
<table border="0" cellspacing="0" width="100%">
<tbody>
<tr>
<td align="center" width="180">
<a href="../">
<img alt="Online-languages.info" border="0" src="img/logo.png"/>
</a>
</td>
<td align="left" style="background: url('img/bg.png'); -moz-border-radius:20px; padding: 20px 20px 20px 20px; ">
<h1 style="color:#fff; font-size:20pt;">
French words in example sentences
</h1>
<h3 style="color:#fff; font-size:8pt; font-weight:normal;">
French language resources at
<a href="http://www.online-languages.info" style="color:white;">
Online-languages.info
</a>
</h3>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td align="left" valign="top" width="180">
<table cellpadding="0" cellspacing="0" class="t2" width="180">
<tbody>
<tr>
<td>
<a class="arect" href="index.php">
Home
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="grammar.php">
French grammar
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="phrases.php">
French phrases
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="vocabulary.php">
French vocabulary
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="trainer.php">
Vocabulary trainer
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="picture-dictionary.php">
Picture dictionary
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="dictionary.php">
French dictionary
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="flashcards.php">
Flashcards
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="audio.php">
Audio
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="video.php">
Video
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="translator.php">
French translator
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="tests.php">
French quizzes
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="examples.php">
Examples of use
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="pronunciation.php">
French pronunciation
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="news.php">
News in French
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="applications.php">
Language software
</a>
</td>
</tr>
<tr>
<td>
<a class="arect" href="mobile.php">
Mobile phones
</a>
</td>
</tr>
</tbody>
</table>
<img alt="" border="0" height="0" src="http://whos.amung.us/swidget/fnhahzdo0ncz.gif" style="display:none;" width="0"/>
</td>
<td align="left" bgcolor="#ffffff" valign="top" width="90%">
<script type="text/javascript">
<!--
google_ad_client = "ca-pub-7058441231119392";
/* online-languages */
google_ad_slot = "3704078504";
google_ad_width = 728;
google_ad_height = 90;
//-->
</script>
<script src="http://pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript">
</script>
<br/>
<br/>
<div align="justify">
<div id="content">
<iframe frameborder="0" height="650" src="http://www.dicts.info/examples.php?lang=French&disa=1" width="95%">
</iframe>
</div>
</div>
<!-- cookieconsent2 by Silktide -->
<script type="text/javascript">
window.cookieconsent_options = {
learnMore: 'More info',
message: 'This website uses cookies to personalize content and to improve your experience on our website.',
link: 'https://www.google.com/policies/technologies/cookies/',
theme: 'light-bottom'
};
</script>
<script src="https://s3.amazonaws.com/cc.silktide.com/cookieconsent.latest.min.js" type="text/javascript">
</script>
<noscript>
<p>We recommend you enable JavaScript to take full advantage of this website.</p>
</noscript>
</td>
</tr>
</tbody>
</table>
<br/>
<table width="700">
<tbody>
<tr>
<td align="center">
<a href="../english">
<img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/anglictina"/>
<br/>
English
</a>
</td>
<td align="center">
<a href="../german">
<img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/spanelstina"/>
<br/>
German
</a>
</td>
<td align="center">
<a href="../french">
<img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/francouzstina"/>
<br/>
French
</a>
</td>
<td align="center">
<a href="../spanish">
<img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/spanelstina"/>
<br/>
Spanish
</a>
</td>
<td align="center">
<a href="../russian">
<img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/rustina"/>
<br/>
Russian
</a>
</td>
<td align="center">
<a href="../chinese">
<img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/cinstina"/>
<br/>
Chinese
</a>
</td>
</tr>
</tbody>
</table>
<br/>
<br/>
<table cellpadding="10" style="background:url(img/bgfoot.jpg);" width="100%">
<tbody>
<tr>
<td align="center">
<font color="#0000aa">
<a href="../licence.html">
Licence
</a>
|
<a href="../licence.html">
Terms of use
</a>
|
<a href="../licence.html#disclaimer">
Disclaimer
</a>
|
<a href="../licence.html#privacy">
Privacy policy
</a>
|
<a href="http://www.dicts.info/contact.php?s=Online-languages">
Contact
</a>
</font>
<br/>
Copyright © 2007-2017, Online-languages.info
</td>
</tr>
</tbody>
</table>
</div>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-8795372-1");
pageTracker._trackPageview();
} catch(err) {}
</script>
</body>
</html>
Upvotes: 1
Views: 138
Reputation: 7441
This works for me. Notice that I used the GET
method and the URI that is referenced in the actual form on that page.
import requests
word = 'manger'
url ='http://www.dicts.info/examples.php'
headers = {'Referer': 'http://www.dicts.info/examples.php?disa=1&lang2=french&word=bon&go=Search'}
params = {'word':word,'disa':'1','lang2':'french'}
response = requests.get(url, params=params, headers=headers)
print(response.text)
UPDATE
It appears the PHP page checks to make sure there is an appropriate referer header sent with the request. So add one, as I did above (edited original).
Upvotes: 1