Brian Powell
Brian Powell

Reputation: 3411

Communicate with a separate Chrome tab

I understand the security risks of this and why I get errors like this:

Uncaught DOMException: Blocked a frame with origin 
"http://myurl.com" from accessing a cross-origin frame.

so I'm wondering if there is a safe way for me TO do this.

There are two websites internal to our company - mine, and another one that do not exist on the same domain.

Within my page, I am interested in loading the second page in a way that allows me to access ID elements from that page, pull in the data that those ID elements contain, and return that data into my page so I can display it to my users. No API exists for me to get this data from the second source. Ultimately, I'd love it if there were a way as well for me to input data BACK into the source page, but there is so much risk in general for injections and attacks that I doubt there's any way for me to do this, even though my intentions are not malicious.

I've tried a few things:

/* Literally load the page within my own and pull data once it's loaded */
$('#test').load('url.com/site2');

/* load the second page as a variable, then try to access an id on 
that page through the variable */
var win = window.open('url.com/site2');
var test = win.getElementByID('#id_element_i_want_to_pull');

/* I can do something using PHP, but this just loads the page, but doesn't allow 
me to access any of the ID elements on that page which 
doesn't really help me: */
$temp = file_get_contents('url.com/site2');

Is there any way to go about this? I do not have access to the code on the second server, so there wouldn't (probably) be a way for me to put any code there that would grant me access to do this if that were required. If that were the only way though, I'd at least like to know it and know how this type of request would be done were it possible in the first place.

Upvotes: 0

Views: 55

Answers (2)

JDev518
JDev518

Reputation: 778

If it's a site that you otherwise do not have direct access to, it sounds like you could do some DOM "hoovering" or "scraping" using the DOMDocument class as mentioned already.

With DOMDocument, you can grab the contents of an entire page and then filter it by the tags / attributes you're looking for. I've written something like this in PHP7 in the past, this may help:

 class HooverDom {
      public $content;

      public static function checkContentUrl($url) {
         if (stripos($url, 'http') !== 0) {
           return 'http://' . $url;
         }
         return $url;
      }

      public function getContent($url) {
        if (!$this->content) {
           $url = self::checkContentUrl($url);

           if ($url) {
              $this->content = new \DOMDocument( '1.0', 'utf-8' );
              $this->content->preserveWhiteSpace = false;
              // suppress warnings from invalid code
              @$this->content->loadHTMLFile($url);
           }
        }
        return $this->content;
     }

     /**
      * @param $url
      * @param $tag
      *
      * @return array
      * Extract tags that are of interest
      */
     public function getTags($url, $tag) {
        $count = 0;
        $result = array();
        $url = self::checkContentUrl($url);
        if (!$url) return false;

        $elements = $this->getContent($url)->getElementsByTagName($tag);

        foreach ($elements as $node) {
           $result[$count]['value'] = trim(preg_replace('/\s+/', ' ', $node->nodeValue));
           if ($node->hasAttributes()) {
               foreach ($node->attributes as $name => $attr) {
                  $result[$count]['attributes'][$name] = $attr->value;
               }
           }
           $count++;
        }
        return $result;
     }

      /**
       * @param $url
       * @param $attr
       * @param null [$domain]
       *
       * @return array
       * Extract specific attributes rather than tags. Get all tags with *
       * and get their attributes. Optional $domain value keeps all results
       * within supplied domain name
       */
       public function getAttributes($url, $attr, $domain = null) {
         $result = array();
         $elements = $this->getContent($url)->getElementsByTagName('*');

         foreach ($elements as $node) {
            if ($node->hasAttribute($attr)) {
               $value = $node->getAttribute($attr);
               if ($domain) {
                  if (stripos($value, $domain) !== FALSE) {
                     $result[] = trim($value);
                  }
               } else {
                 $result[] = trim($value);
               }
            }
         }
         return $result;
       }
 }

 define('DEFAULT_URL', 'https://developer.mozilla.org/en-US');
 define('DEFAULT_TAG', 'div');

 $vac = new HooverDom();

 $url = strip_tags($_GET['url'] ?? DEFAULT_URL);
 $tag = strip_tags($_GET['tag'] ?? DEFAULT_TAG);

 echo 'Dump of tags: ' . PHP_EOL;
 var_dump($vac->getTags($url, $tag));

This will grab all of the links on the page and spit out a list for you. This way you have some structure to work with instead of a massive string from file_get_contents().

The output would look something like this using https://developer.mozilla.org/en-US/ as an example:

 array (size=56)
   0 => 
     array (size=2)
       'value' => string 'Mozilla is working on a new program for developers and other web builders like you. Help shape that program by taking our 10 minute survey: https://googl/forms/Ync2VuTWwAkQFvJx2' (length=178)
  'attributes' => 
    array (size=1)
      'class' => string 'global-notice' (length=13)
   1 => 
     array (size=2)
  'value' => string 'Mozilla is working on a new program for developers and other web builders like you. Help shape that program by taking our 10 minute survey: ' (length=178)
  'attributes' => 
    array (size=1)
      'class' => string 'wrap center' (length=11)

..........

Sorry about some of the formatting mishaps, let me know if you need anything clarified. You could loop through the results and isolate specific element IDs / classes / any other attributes you're looking for and grab the content in "value".

Note the NULL coalesce operator (??) that's only in PHP 7 in case you're running 5.

Upvotes: 1

nathan gonzalez
nathan gonzalez

Reputation: 12017

I think you're on the right track with loading it on the server, you just need to parse it into something that you can use to get things by id. It's been awhile since I've done much in PHP, but you should be able to use the DOMDocument class to do this. Basically you load the text, toss it into one of these guys and then get the elements by their id.

Upvotes: 1

Related Questions