john reid
john reid

Reputation: 1

imacros Extract all text without href

need help for extract text1,text2,text3 (i mean all text, sometimes until text9 in category)

<h4>Category:</h4>
<p><a href="">text1</a>, <a href="">text2</a>, <a href="">text3</a></p>

my imacros code just only extract text1

TAG POS=R1 TYPE=A ATTR=TXT:* EXTRACT=TXT

Q : how extract all text in category ?

Thanks

Upvotes: 0

Views: 1137

Answers (2)

Naren Murali
Naren Murali

Reputation: 57696

This code, will extract the data in all the A tags inside the P tag, but there is a small setup you need to do, I use XPATH to get the path of the A tags.

Please install:

XPath Checker By Brian Slesinsky

or

how to find the xpath of an element (I would recommend the chrome console method)

with this you need to right click on the a tag and give View XPATH, this will give you an XPATH like

/x:html/x:body/x:p/x:a[2]

Then, after you get this X path you need to paste it in the Xpath value (Note you need to remove the x: from the above XPATH before pasting. Also note the number in the [] of the Xpath indicates the child number, since we use !LOOP to set the line number we ignore [2]) of the tag, refer the below code where I have done the same with the above Xpath

Note: 1. Please loop the imacros code according to the number of A tags you want to extract. 2. You also need to update the folder attribute of SAVEAS line, to your desktop path.

Code:

SET !LOOP 1
SET !ERRORIGNORE YES
TAG XPATH=(/html/body//p/a)[{{!LOOP}}] EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=C:/Users/Test/Desktop/ FILE=output.csv

Upvotes: 1

Pinkie Pie
Pinkie Pie

Reputation: 728

To expand on the JavaScript comment, this is how you could go about it:

ExtractCategory.js content

// Play the macro reading the category data
iimPlay("foo.iim");
// Get the last extracted value, i.e. the p content
var pContent = iimGetExtract();
// Parse the p using regex, first find a tag pairs and then drop the surrounding a tags
var result = pContent.match(/<a(.*?)<\/a>/g).map(function(val){
   return val.replace(/<\/?a>/g,'').replace(/<a.+>/g,'');
});
// Pass the generated String to another macro to work with it
iimSet("passed_var", result);
iimPlay("bar.iim");

Next to ExtractCategory.js, foo.iim content

'Your previous code here, line #2 is just to find the right p in line #3 in a mockup html
TAG POS=1 TYPE=H4 ATTR=* 
TAG POS=R1 TYPE=P ATTR=* EXTRACT=HTM

Next to ExtractCategory.js, bar.iim content

'Do whatever with the passed variable containing your formatted String
'This is just an output to show it
PROMPT {{passed_var}}

When you run ExtractCategory.js it will run your foo.iim code to extract the p content, parse it with regex (might want to be careful here, depending on what texts you are expecting this might break) and then pass the generated String on to another macro to do with it what you please.

Running this your result is text1,text2,text3 as desired.

Read up on http://wiki.imacros.net/iimSet() and http://wiki.imacros.net/iimPlay() if you need further information on how to use them.

Upvotes: 2

Related Questions