Reputation: 1
need help for extract text1,text2,text3 (i mean all text, sometimes until text9 in category)
<h4>Category:</h4>
<p><a href="">text1</a>, <a href="">text2</a>, <a href="">text3</a></p>
my imacros code just only extract text1
TAG POS=R1 TYPE=A ATTR=TXT:* EXTRACT=TXT
Q : how extract all text in category ?
Thanks
Upvotes: 0
Views: 1137
Reputation: 57696
This code, will extract the data in all the A tags inside the P tag, but there is a small setup you need to do, I use XPATH to get the path of the A tags.
Please install:
XPath Checker By Brian Slesinsky
or
how to find the xpath of an element (I would recommend the chrome console method)
with this you need to right click on the a tag and give View XPATH, this will give you an XPATH like
/x:html/x:body/x:p/x:a[2]
Then, after you get this X path you need to paste it in the Xpath value (Note you need to remove the x: from the above XPATH before pasting. Also note the number in the [] of the Xpath indicates the child number, since we use !LOOP to set the line number we ignore [2]) of the tag, refer the below code where I have done the same with the above Xpath
Note: 1. Please loop the imacros code according to the number of A tags you want to extract. 2. You also need to update the folder attribute of SAVEAS line, to your desktop path.
Code:
SET !LOOP 1
SET !ERRORIGNORE YES
TAG XPATH=(/html/body//p/a)[{{!LOOP}}] EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=C:/Users/Test/Desktop/ FILE=output.csv
Upvotes: 1
Reputation: 728
To expand on the JavaScript comment, this is how you could go about it:
ExtractCategory.js content
// Play the macro reading the category data
iimPlay("foo.iim");
// Get the last extracted value, i.e. the p content
var pContent = iimGetExtract();
// Parse the p using regex, first find a tag pairs and then drop the surrounding a tags
var result = pContent.match(/<a(.*?)<\/a>/g).map(function(val){
return val.replace(/<\/?a>/g,'').replace(/<a.+>/g,'');
});
// Pass the generated String to another macro to work with it
iimSet("passed_var", result);
iimPlay("bar.iim");
Next to ExtractCategory.js, foo.iim content
'Your previous code here, line #2 is just to find the right p in line #3 in a mockup html
TAG POS=1 TYPE=H4 ATTR=*
TAG POS=R1 TYPE=P ATTR=* EXTRACT=HTM
Next to ExtractCategory.js, bar.iim content
'Do whatever with the passed variable containing your formatted String
'This is just an output to show it
PROMPT {{passed_var}}
When you run ExtractCategory.js it will run your foo.iim code to extract the p content, parse it with regex (might want to be careful here, depending on what texts you are expecting this might break) and then pass the generated String on to another macro to do with it what you please.
Running this your result is text1,text2,text3 as desired.
Read up on http://wiki.imacros.net/iimSet() and http://wiki.imacros.net/iimPlay() if you need further information on how to use them.
Upvotes: 2