Amna Arshad
Amna Arshad

Reputation: 887

How to read text inside a tag

I need to scrape this page: https://www.arabam.com/ilan/galeriden-satilik-lamborghini-gallardo-lp-560-4/mini-motors-dan-2009-gallardo-lp560-4-seramik-lift-bayi-boyasiz/14934711

If you scroll down you'll see this enter image description here

I scroll down the page and then get xpath for this. This is the xpath: //div[@id="js-hook-description"]//p/text

And this is the code

const results = xpathT.fromPageSource(data).findElements(rest);
    
    //console.log("The href value is:", results[0].getAttribute("href"));
    console.log(`Your full text is "${results[0].getText()}"`);
    if (results.length > 0) {
      let _results = [];
      if (path.includes("href", 0)){
          
          for (let r of results) {
              
            _results.push(r.getAttribute("href"));
          }
      }
      if (path.includes("text", 0)){
          //console.log("inside");
          //console.log(results);
          for (let r of results) {
             console.log(r.getText());
            _results.push(r.getText());
          }

When I simply print results it gives me this:

Your full text is "<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">LAMBORGHİNİ GALLARDO LP560-4</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">2009 MODEL - 38.000 KM</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">DOĞUŞ OTO <font color="#ff0000">BAYİİ</font> ÇIKIŞLI</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">AİRMATİC (LİFT)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">SERAMİK FREN</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">GERİ GÖRÜŞ KAMERASI</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">PADDLESHİFT (F1)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">2 BÖLGE KLİMA</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">DERİ KOLTUK</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">Bİ-ZENON FAR</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">YAĞMUR SENSÖRÜ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">CD-USB-AUX-MP3</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">?</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">BOYA - HATA - TRAMER - HASAR KAYDI </font><font color="#ff0000"><font size="4"> </font><font size="5">YOKTUR</font></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5">ARACIMIZIN TAMPONLARI DAHİL <font color="#ff0000">BOYASIZ</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">YEDEK ANAHTARI <font color="#ff0000">MEVCUTTUR</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b>?</b></span><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5">DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5" color="#ff0000"><br/></font></b></span></p>,<p style="text-align: cente...

But when I call .getText() it returns undefined. What's the possible solution for this?

Upvotes: 2

Views: 7871

Answers (1)

theDavidBarton
theDavidBarton

Reputation: 8851

You can use page.evaluate to get the innerText property of any DOM elements. In case you need the text by paragraphs you should use the proper CSS selector for the <p> elements, in this case it is: #js-hook-description > div > p. The matching elements can be collected with the page.$$ method (it is the same as document.querySelectorAll() in the Page's context), then these elements can be iterated over (see a for..of and an Array.map variation below), in each iteration the innerText is retrieved and also a String.trim() is applied to clean the paragraphs from linebreaks (e.g.: \n).

// full text content into one string
  const fullText = await page.evaluate(el => el.innerText, await page.$('#js-hook-description'))
  console.log(fullText)

// each paragraph into an array element I.
  const textArray = []
  const paragraphs = await page.$$('#js-hook-description > div > p')
  for (const p of paragraphs) {
     const actualPara = await page.evaluate(el => el.innerText.trim(), p)
     textArray.push(actualPara)
  }
  console.log(JSON.stringify(textArray))

An alternative solution can be done with page.$$eval and Array.map:

// each paragraph into an array element II.
  const alternativeSolution = await page.$$eval('#js-hook-description > div > p', paragraphs => paragraphs.map(p => p.innerText.trim()))
  console.log(JSON.stringify(alternativeSolution))

Full text output:

LAMBORGHİNİ GALLARDO LP560-4 2009 MODEL - 38.000 KM DOĞUŞ OTO BAYİİ ÇIKIŞLI AİRMATİC (LİFT) SERAMİK FREN GERİ GÖRÜŞ KAMERASI PADDLESHİFT (F1) 2 BÖLGE KLİMA DERİ KOLTUK Bİ-ZENON FAR YAĞMUR SENSÖRÜ CD-USB-AUX-MP3 ? BOYA - HATA - TRAMER - HASAR KAYDI  YOKTUR ARACIMIZIN TAMPONLARI DAHİL BOYASIZ YEDEK ANAHTARI MEVCUTTUR ? DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ 0533 239 22 77

Array output line-by-line:

["LAMBORGHİNİ GALLARDO LP560-4","2009 MODEL - 38.000 KM","DOĞUŞ OTO BAYİİ ÇIKIŞLI","","AİRMATİC (LİFT)","SERAMİK FREN","GERİ GÖRÜŞ KAMERASI","PADDLESHİFT (F1)","2 BÖLGE KLİMA","DERİ KOLTUK","Bİ-ZENON FAR","YAĞMUR SENSÖRÜ","CD-USB-AUX-MP3","","?","BOYA - HATA - TRAMER - HASAR KAYDI  YOKTUR","","ARACIMIZIN TAMPONLARI DAHİL BOYASIZ","","YEDEK ANAHTARI MEVCUTTUR","?","DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ","","0533 239 22 77"]

Upvotes: 2

Related Questions