alex2005
alex2005

Reputation: 31

DOMdocument, editing elements

So, what I want to do, is make a script that will automatically add my login info(which i will have in my database) to whatever form I want.

To do this, I get the html source from the website(using cURL) then with DOMdocument i'm editing the input's username and password form name with my username and password values, then I'm outputing this, and click login

All should be alright, right? Yeah, in theory, but it isn't.

This is the code that does right that:

$dom = new DOMdocument();
$dom->formatOutput = true;
@$dom->loadHTML( mb_convert_encoding($html, 'HTML-ENTITIES', $encoding) );

$inputs = $dom->getElementsByTagName('input');
foreach ($inputs as $input)
{
    if ($input->getAttribute('name') == $id_nameValue)
    {
    $new_input = $dom->createElement('input');

    $new_input->setAttribute('name', $id_nameValue);
    $new_input->setAttribute('value', $id_value);

    $input->parentNode->replaceChild($new_input, $input);
    }

    if ($input->getAttribute('name') == $password_nameValue)
    {
    $new_input = $dom->createElement('input');

    $new_input->setAttribute('name', $password_nameValue);
    $new_input->setAttribute('value', $password_value);
    $new_input->setAttribute('type', 'password');

    $input->parentNode->replaceChild($new_input, $input);
    }
}

echo $dom->savehtml();

The problem I'm having, is with javascript not loading or css, or not redirecting correctly...

Lets take for example reddit: https://ssl.reddit.com/login They have this for the CSS

<link rel="stylesheet" href="/static/reddit.cYdhnJIJSZ0.css" type="text/css" />

instead of having https://ssl.reddit.com/login/static/reddit.cYdhnJIJSZ0.css, so I cant load it correctly, because it uses my url like

MY_URL.com/static/reddit.cYdhnJIJSZ0.css to find it...

The same applies to javascript, like

<script type="text/javascript" src="/static/jquery.js">

Or with

<form id="login_login" method="post" action="/post/login" class="user-form login-form">

this would redirect me to MY_URL.com/post/login

My question is how can I make this work? How can I edit the links to include the websites url? Since this is the first time i'm using DOMdocument, I don't know how would I go about editing the form, or script src...

So my end result would be

<link rel="stylesheet" href="https://ssl.reddit.com/login/static/reddit.cYdhnJIJSZ0.css" type="text/css" />
<script type="text/javascript" src="https://ssl.reddit.com/login/static/jquery.js">
<form id="login_login" method="post" action="https://ssl.reddit.com/login/post/login" class="user-form login-form">

Upvotes: 0

Views: 578

Answers (1)

Decent Dabbler
Decent Dabbler

Reputation: 22783

I think the easiest way to do this is to inject a base tag with an href attribute set to the base url of the last effective url (the url that was ultimately fetched by cURL in case of possible redirects). This last effective url can be retrieved with cURL by using:

$url = curl_getinfo( $ch, CURLINFO_EFFECTIVE_URL );

I've explained how to set the base tag with DOMDocument in this answer. It also accounts for situations where there is already a base tag. Although admittedly, my example doesn't look for the presence of a href attribute in the base tag yet. It should be trivial to add this check though by utilizing DOMElement::hasAttribute().

edit
In response to alex2005's comment:

You could alter it a bit, and do this:

$baseElement = $doc->createElement( 'base' );
$baseElement->setAttribute( 'href', $url );
$headElement = $doc->getElementsByTagName( 'head' )->item( 0 );

// it will automatically append, if $headElement has no firstChild (i.e. is null)
$headElement->insertBefore( $baseElement, $headElement->firstChild );

edit 2
A little warning though. I've overlooked something.

$url = curl_getinfo( $ch, CURLINFO_EFFECTIVE_URL );

... could effectively return an url like:

http://example.com/some/path/to/a/file.html

I'm not sure how browsers deal with filenames in a base tags. I'd assume they extract the directory path. But not sure about this.

But apart from that possible caveat, in most cases you probably only want to have the domain name of the last redirected url, to be used in the base tag.

At least this is true for resolving absolute uri's such as

/css/some.css
/js/some.js
/some/file.html

For resolving relative uri's such as:

css/some.css
js/some.js
some/file.html

... you'd probably want to extract the directory part of the url as well:

http://example.com/some/path/to/a/

So, after given it a little more thought, it's probably not so trivial to account for all possible scenario's. Be aware of this.

Upvotes: 1

Related Questions