Zahra Hnn
Zahra Hnn

Reputation: 149

<div> tags inside <div> using importXML Xpath query, in Google Spreadsheet

I'm using Xpath in Google docs to get the text inside <div>. I want to save the text inside <div id="job_description"> in one cell of Google doc spreadsheet, but it shows each <div> in separate cell.

<div id="job_description">
    <div>
        <strong>
            Basic Purpose:
        </strong>
        <br></br>
    </div>
    <div>
        Work closely with developers, product owners and Q…
        <br></br>
    </div>
    <div>
        The Test Analyst is accountable for the developmen…
        <br></br>
    </div>
    <div>
        <strong>
            Duties and Responsibilities:
        </strong>
    </div>
    <ul>
        <li></li>
        <li></li>
    </ul>
    <div>
        <strong>
            Requirements:
        </strong>
        <br></br>
    </div>
    <ul>
        <li></li>
        <li></li>
    </ul>
</div>

Image: https://i.sstatic.net/K0mAY.png

and this is the code I wrote:

=IMPORTXML(E4,"//div[@id='job_description']")

May you help me to put all of the text (including <div> <ul> ...) inside the <div id="job_description"> in only one cell ?

Upvotes: 1

Views: 24600

Answers (3)

Aurielle Perlmann
Aurielle Perlmann

Reputation: 5509

you can also use regexreplace to remove the line breaks, with

=REGEXREPLACE(IMPORTXML(E4,"//div[@id='job_description']"),"\n","")

this should wrap it all into one cell for you.

Upvotes: 1

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22617

Using JOIN is a good start, but you can make it a single operation.

You did not show the URL to the page you're importing, so I can only give you an example with another page. For instance, if you are importing www.w3.org and looking for a div where @class='event closed expand_block', use

=JOIN(CHAR(10),IMPORTXML("http://www.w3.org/","//div[@class='event closed expand_block']//text()"))

Notice that I also modified the XPath expression: //text() makes sure only descendant text nodes are retrieved, that is, all the text.


EDIT: Responding to your comment:

May I know what is CHAR(10) referring to?

Yes, of course. CHAR returns a character and takes a number as input. In the case of CHAR(10), a newline character is returned (I assume because of &#10;).

In the formula, CHAR(10) is used as the first argument of JOIN, which is the delimiter of the objects that are to be joined.

Upvotes: 6

Zahra Hnn
Zahra Hnn

Reputation: 149

For now I found a solution , I'll put it here so that others can know my answer, but if there is any other solution please let us know

I used JOIN to put the separate cells (L3:X3) into one single cell

=Trim(JOIN(" ",L3:X3)) 

Upvotes: 1

Related Questions