Reputation: 21
I am scraping a website which is done in classic asp.net. It has 2 fields with ID. One is input text and another one is a Button. I need to fill in the input box and click the button. And also get the response. The button is a Submit type.
I was using HTML Agility pack. But it's not sufficient with filling the input box and click the button.
Example of the code is : '
<table class="MainTable">
<tbody>
<tr>
<td class="styleIndent"> </td>
<td class="Labels"><span id="ctl00_MainContent_lblLastName" class="fieldHeader" for="ctl00_MainContent_txtLastName">Name:</span></td>
<td class="styleColumnBody">
<input name="ctl00$MainContent$txtLastName" type="text" value="sberbank" maxlength="250" id="ctl00_MainContent_txtLastName" tabindex="2" title="Enter name as search criteria." style="width:200px;">
</td>
<td class="Labels"><span id="ctl00_MainContent_lblCity" class="fieldHeader" for="ctl00_MainContent_txtCity">City:</span></td>
<td class="styleColumnBody">
<input name="ctl00$MainContent$txtCity" type="text" maxlength="250" id="ctl00_MainContent_txtCity" tabindex="6" title="Enter city name as search criteria." style="width:200px;">
</td>
</tr>
<tr>
<td class="Labels"></td>
<td style="text-align: left">
<input type="submit" name="ctl00$MainContent$btnSearch" value="Search" id="ctl00_MainContent_btnSearch" tabindex="9" style="font-weight:normal;height:22px;width:96px;">
<input type="submit" name="ctl00$MainContent$btnReset" value="Reset" id="ctl00_MainContent_btnReset" tabindex="10" style="font-weight:normal;height:22px;width:96px;">
</td>
</tr>
</tbody></table>
'
It's a Classic .Net where the page is reloaded on a button click (ctl00_MainContent_btnSearch). So it's hard to know anything by inspecting the page:
Upvotes: 0
Views: 8852
Reputation: 1
First, you need to install the Selenium WebDriver NuGet package in your project. You can do this from the NuGet console with the following command:
Install-Package Selenium.WebDriver
In your controller, you can define an action that receives the ID number to search and uses Selenium WebDriver to navigate to the search page, fill out the form, and get the results. Here is an example of what this action might look like:
public IActionResult Index()
{
var userAgent = HttpContext.Request.Headers["User-Agent"];
return View();
}
public IActionResult Search(string dni)
{
var options = new ChromeOptions();
options.AddArgument("headless");
options.AddArgument("disable-gpu");
IWebDriver driver = new ChromeDriver(options);
try
{
// Navegar a la página de búsqueda
driver.Navigate().GoToUrl("https://eldni.com/pe/buscar-por-dni");
// Llenar el formulario con el número de DNI
var inputElement = driver.FindElement(By.Name("dni"));
inputElement.SendKeys(dni);
// Hacer clic en el botón de búsqueda
var buttonElement = driver.FindElement(By.XPath("//button[contains(@class, 'btn-success')]"));
buttonElement.Click();
//// Esperar a que la página de resultados cargue completamente
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
IWebElement resultsElement = wait.Until(ExpectedConditions.ElementIsVisible(By.Id("div-copy")));
// Obtener los resultados
string nombre = resultsElement.FindElement(By.Id("nombres")).GetAttribute("Value");
string apellidop = resultsElement.FindElement(By.Id("apellidop")).GetAttribute("Value");
string apellidom = resultsElement.FindElement(By.Id("apellidom")).GetAttribute("Value");
// Devolver los resultados en la vista
return Json(new { Nombre = nombre, ApellidoP = apellidop, ApellidoM = apellidom });
}
finally
{
// Cerrar el navegador
driver.Quit();
}
}
In your view, you can display the results obtained in the previous step:
$(document).ready(function () {
$("#searchButton").click(function () {
var dni = $("#dni").val();
$.ajax({
type: "POST",
url: "/Controller/Search?dni=" + dni,
success: function (data) {
$("#resultado").html(
"<br><br>" +
"<h3>RESULTADO</h3>" +
"<table class='table-bordered table-striped' style='width: 100%' >" +
"<thead><tr><th>NOMBRES</th><th>A. PATERNO</th><th>A. MATERNO</th></tr></thead>" +
"<tbody><tr><td>" + data.Nombre + "</td><td>" + data.ApellidoP + "</td><td>" + data.ApellidoM + "</td></tr></tbody>" +
"</table>"
);
}
});
});
});
<div class="form-group">
<label for="dni">DNI: </label>
<input type="number" class="form-control" id="dni" name="dni" maxlength="8" value="@Model" oninput="javascript: if (this.value.length > this.maxLength) this.value = this.value.slice(0, this.maxLength);">
</div>
<button type="button" class="btn btn-primary" id="searchButton">Consultar Datos</button>
<div id="resultado"></div>
I hope to be helpful. Greetings
Upvotes: 0
Reputation: 196
How about using headless chrome? you can navigate to web page and do any operation as you please.
https://github.com/kblok/puppeteer-sharp
// lauch browser and save in variable
var _browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
ExecutablePath = _config.ChromePath, // get path to chrome executable
});
// go to page
var _page = await _browser.NewPageAsync();
var page.GoToAsync("http://www.example.com");
// click on form input
await _page.ClickAsync("#name");
// set data
await _page.Keyboard.SendCharacterAsync("John");
// submit form
await _page.ClickAsync("#SubmitButton");
Upvotes: 0
Reputation: 17879
Html Agility Pack is designed to parse, query and manipulate the HTML DOM. Some kind of crawlers would be a use-case for it. But you want to acually run the http request, javascript-event or whatever is behind those button. The easiest method with most features is to remote-control a webbrowser.
First install Selenium and a browser-driver. I'm using Firefox here since it's free, open source and keep an eye on privacy:
Install-Package Selenium.WebDriver
Install-Package Selenium.Firefox.WebDriver
Download the driver executable of your browser. Firefox gecko driver could be found on github here: https://github.com/mozilla/geckodriver/releases/download/v0.24.0/geckodriver-v0.24.0-win64.zip Version overview if post get older: https://github.com/mozilla/geckodriver/releases
Now execute the archive and copy it's path to a variable:
string geckoDriverPath = @"D:\Downloads\geckodriver-v0.24.0-win64";
We're ready to start using Firefox. A simple example that enter some query in the search field of stackoverflow and click the search-button on the right:
using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;
using OpenQA.Selenium.Support.UI;
using System;
class Program {
static void Main(string[] args) {
string geckoDriverPath = @"D:\Downloads\geckodriver-v0.24.0-win64";
using (var driver = new FirefoxDriver(geckoDriverPath)) {
driver.Navigate().GoToUrl("https://stackoverflow.com");
var searchBox = driver.FindElementByCssSelector("#search .js-search-field");
searchBox.SendKeys("Selenium");
var searchButton = driver.FindElementByCssSelector("#search .js-search-submit");
searchButton.Click();
Console.Read();
}
}
}
Please be patient, it can take a few seconds to initialize the browser.
Depending on what your button click is doing, there may be other ways. If it is some kind of http request (form or ajax call), you could send it manually. This is faster, saves ressources and you can run it headless easily. But it's harder to realize. Especially on complex pages where you need to extract data like ids from the page source. You may consider this if you care about performance and ressources.
Upvotes: 2
Reputation: 724
If the form is a standard HTML form, you can obtain the post-back url and then post the form data yourself. In essence, you are performing the action that the button would normally do instead of filling out the form itself.
To get this to work you need the URL that is being posted to, and the name of the elements that are getting posted back to the server. You can easily obtain this through any web inspector tools. Once you have it, you can do the below:
var request = (HttpWebRequest)WebRequest.Create(uri);
request.Method = HttpMethod.Post.ToString();
request.ContentType = "application/json";
// replace name1, name2, value1, value2 with the
// key value pairs that need to be posted.
var content = $"{name1}={value1}&{name2}={value2}"
using (var writer = new StreamWriter(request.GetRequestStream()))
{
writer.Write(content);
}
request.ContentLength = content.Length;
using (var response = (HttpWebResponse)request.GetResponse())
{
var encoding = Encoding.GetEncoding(response.CharacterSet);
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream, encoding))
{
return reader.ReadToEnd();
}
}
}
If you are using .NET 4.5 or above you can use the HttpClient class which makes this a lot simpler:
var httpClient = new HttpClient();
response = await httpClient.PostAsync(uri, new StringContent(data));
response.EnsureSuccessStatusCode();
string content = await response.Content.ReadAsStringAsync();
Upvotes: 0