Luca Tarasco
Luca Tarasco

Reputation: 11

Scrape website behind a form with dynamic action link

I'm trying to scrape a website page, but the page is hidden behind a form. I was trying to do this with some PHP and the simple_html_dom.php library. Unfortunately the action link of the form appears to be dynamically generated, as I am only able to scrape the initial part of the link

I used the following code

<?php
require 'simple_html_dom.php';

$formPageUrl = "https://example.com/form-page";

$html = file_get_html($formPageUrl);

$form = $html->find('form', 0);

if (!$form) {
    die("Form not found.");
}

$actionUrl = $form->action;

if (!parse_url($actionUrl, PHP_URL_SCHEME)) {
    $actionUrl = rtrim($formPageUrl, '/') . '/' . ltrim($actionUrl, '/');
}

$formData = [];
foreach ($form->find('input') as $input) {
    $name = $input->name;
    $value = $input->value ?? '';

 
    if ($name === 'username') {
        $value = 'my_username';
    } elseif ($name === 'password') {
        $value = 'my_password';
    }

    if ($name) {
        $formData[$name] = $value;
    }
}


// Invia i dati del form usando cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $actionUrl);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($formData));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$response = curl_exec($ch);

if (curl_errno($ch)) {
    die("Error cURL: " . curl_error($ch));
}

curl_close($ch);

echo $response;

It was giving me nothing at all as a response, by echoing the action link I found it doesn't match with he one on the page For example:

I get /it/it/page/ But the actual action link contains a random string: /it/it/page/Aihrkjrnjfvijkregv1,

By inspecting the browser console Network tab, that string is indeed used as a payload to get the page and it changes everytime you start a new session, preventing me from replicating the request. I'm kinda new to web scraping, so any useful advice is appreciated.

Upvotes: 0

Views: 55

Answers (0)

Related Questions