Vishal Varshney
Vishal Varshney

Reputation: 1

I am trying to scrap website but get only one array detail in xml file

I am trying to scrape this webpage. In this webpage I have to get the job title and its location. Which I am able to get from my code. But the problem is coming that when I am sending it in XML, then only one detail is going from the array list.

I am using goutte CSS selector library and also please tell me how to scrap pagination in goutte CSS selector library.

here is my code:

$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', 'https://www.simplyhired.com/search?q=pharmacy+technician&l=American+Canyon%2C+CA&job=X5clbvspTaqzIHlgOPNXJARu8o4ejpaOtgTprLm2CpPuoeOFjioGdQ');



$job_posting_location = [];
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location')
->each(function ($node) use (&$job_posting_location) {
$job_posting_location[] = $node->text() . PHP_EOL;
});

$joblocation = 0;
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-title-container h3 a')
->each( function ($node) use ($job_posting_location, &$joblocation, $httpClient) {
$job_title = $node->text() . PHP_EOL; //job title
$job_posting_location = $job_posting_location[$joblocation]; //job posting location

// display the result
$items = "{$job_title} @ {$job_posting_location}\n\n";
global $results;
$result = explode('@', $items);
$results['job_title'] = $result[0];
$results['job_posting_location'] = $result[1];

$joblocation++;

});

function convertToXML($results, &$xml_user_info){
    foreach($results as $key => $value){
       if(is_array($value)){
           $subnode = $xml_user_info->addChild($key);
           foreach ($value as $k=>$v) {
               $xml_user_info->addChild("$k",htmlspecialchars("$v"));
           }
       }else{
           $xml_user_info->addChild("$key",htmlspecialchars("$value"));
       }
       }
   return $xml_user_info->asXML();
}

$xml_user_info = new SimpleXMLElement('<root/>');
$xml_content = convertToXML($results,$xml_user_info);

$xmlFile = 'details.xml';
$handle = fopen($xmlFile, 'w') or die('Unable to open the file: '.$xmlFile);

if(fwrite($handle, $xml_content)) {
    echo 'Successfully written to an XML file.';
}
else{
    echo 'Error in file generating'; 
}

what i got in xml file --

<?xml version="1.0"?>
<root><job_title>Pharmacy Technician
 </job_title><job_posting_location> Vallejo, CA
 </job_posting_location></root>

what i want in xml file --

<?xml version="1.0"?>
<root>
<job_title>Pharmacy Technician</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician 1</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician New</job_title>
<job_posting_location> Vallejo, CA</job_posting_location> 
and so on...
</root>

Upvotes: 0

Views: 90

Answers (1)

ThW
ThW

Reputation: 19512

You overwrite the values in the $results variable. You're would need to do something like this to append:

$results[] = [
  'job_title' => $result[0];
  'job_posting_location' => $result[1]
];

However here is no need to put the data into an array at all, just create the XML directly with DOM.

Both your selectors share the same start. Iterate the card and then fetch related data.

$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', $url);

$document = new DOMDocument();
// append document element node
$postings = $document->appendChild($document->createElement('jobs'));

// iterate job posting cards
$response->filter('.LeftPane article .SerpJob-jobCard.card')->each(
    function($jobCard) use ($document, $postings) {
        // fetch data
        $location = $jobCard
          ->filter(
              '.jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location'
          )
          ->text();
        $title = $jobCard->filter('.jobposting-title-container h3 a')->text();
        // append 'job' node to group data in result
        $job = $postings->appendChild($document->createElement('job'));
        // append data nodes
        $job->appendChild($document->createElement('job_title'))->textContent = $title;
        $job->appendChild($document->createElement('job_posting_location'))->textContent = $location;
    }
);

echo $document->saveXML();

Upvotes: 0

Related Questions