Kevin
Kevin

Reputation: 1106

AppleScript Text delimiters "Can’t get text items"

I'm trying to get AppleScript to find some Data from a Web site and copy them as text.

However sometime I have the error : error "Can’t get text items 2 thru -1 of \"[email protected]\"." number -1728 from text items 2 thru -1 of "[email protected]"

Any idea?

to getInputByClass(theClass, num) -- defines a function with two inputs, theClass and num
    tell application "Safari" --tells AS that we are going to use Safari
        set input to do JavaScript "
document.getElementsByClassName('" & theClass & "')[" & num & "].innerHTML;" in document 1 -- uses JavaScript to set the variable input to the information we want
    end tell
    return input --tells the function to return the value of the variable input
end getInputByClass


-- start here
getInputByClass("spacebefore", 0)

set theText to getInputByClass("spacebefore", 0)

-- clear text
set DATA1ID to extractText(theText, "\">", "</td>")
to extractText(searchText, startText2, endText)
    set tid to AppleScript's text item delimiters
    set startText1 to "x"
    set searchText to ("x" & searchText)
    set AppleScript's text item delimiters to startText1
    set endItems to text item -1 of searchText
    set AppleScript's text item delimiters to endText
    set beginningToEnd to text item 1 of endItems
    set AppleScript's text item delimiters to startText2
    set finalText to (text items 2 thru -1 of beginningToEnd)
    set AppleScript's text item delimiters to tid
    return finalText
end extractText

-- DATA1ID Found

to getInputByClass2(theClass, num) -- defines a function with two inputs, theClass and num
    tell application "Safari" --tells AS that we are going to use Safari
        set input to do JavaScript "
document.getElementsByClassName('" & theClass & "')[" & num & "].innerHTML;" in document 1 -- uses JavaScript to set the variable input to the information we want
    end tell
    return input --tells the function to return the value of the variable input
end getInputByClass2


-- start here
getInputByClass2("inspectDSInInspector", 0)

set theText to getInputByClass2("inspectDSInInspector", 0)

-- clear text
set DATA2DSID to extractText2(theText, "</a>", "</td>")
to extractText2(searchText, startText2, endText)
    set tid to AppleScript's text item delimiters
    set startText1 to "x"
    set searchText to ("x" & searchText)
    set AppleScript's text item delimiters to startText1
    set endItems to text item -1 of searchText
    set AppleScript's text item delimiters to endText
    set beginningToEnd to text item 1 of endItems
    set AppleScript's text item delimiters to startText2
    set finalText to (text items 2 thru -1 of beginningToEnd)
    set AppleScript's text item delimiters to tid
    return finalText
end extractText2


set finalResult to "DATA2DSID: " & DATA2DSID & "
DATA1ID: " & DATA1ID

set the clipboard to finalResult


tell application "System Events" to keystroke "v" using command down

UPDATE :

<td class="inspectDATAInInspector"><a href="/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0"></a>48784745</td>

"href="/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0">48784745" is not a fix data, what won't change is and what I need is the random number at the end, in this case 48784745

The script I made here is working here, but just time to time I have the message mentioned. I think it's could be because I have to convers the data to Plain text until HTML or something like that.

Upvotes: 0

Views: 1109

Answers (1)

vadian
vadian

Reputation: 285082

Common solution, it checks also if the source text contains both tags

set sourceText to "<td class=\"inspectDATAInInspector\"><a href=\"/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0\"></a>48784745</td>"

set startTextAfterTag to "</a>"
set endTextBeforeTag to "</td>"

set startOffset to offset of startTextAfterTag in sourceText
set endOffset to offset of endTextBeforeTag in sourceText
if startOffset = 0 or endOffset = 0 or endOffset < startOffset then
    display dialog "The source text does not contain the specified tags."
    return
end if

set extractedText to extractTextBetweenTags(sourceText, startTextAfterTag, endTextBeforeTag)

on extractTextBetweenTags(theText, startTag, endTag)
    set saveTID to text item delimiters
    set text item delimiters to startTag
    set secondPart to text item 2 of theText
    set text item delimiters to endTag
    set firstPart to text item 1 of secondPart
    set text item delimiters to saveTID
    return firstPart
end extractTextBetweenTags

Edit:

Suggestion #2: It captures everything between the second to last > and the </td tag

set sourceText to "<td class=\"inspectDATAInInspector\"><a href=\"/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0\"></a>48784745</td>"

set startTextAfterTag to ">"
set endTextBeforeTag to "</td"

set extractedText to extractTextBetweenTags(sourceText, startTextAfterTag, endTextBeforeTag)

on extractTextBetweenTags(theText, startTag, endTag)
    set saveTID to text item delimiters
    set text item delimiters to startTag
    set secondPart to text item -2 of theText
    set text item delimiters to endTag
    set firstPart to text item 1 of secondPart
    set text item delimiters to saveTID
    return firstPart
end extractTextBetweenTags

Suggestion #3: If you have SatImage.OSAX installed, you could use regular expression

set sourceText to "<td class=\"inspectDATAInInspector\"><a href=\"/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0\"></a>48784745</td>"

try
    set foundText to find text ">(\\d+)</td>$" in sourceText using 1 with regexp
    set extractedText to foundText's matchResult
on error
    display dialog "The source text does not match the regex."
end try

Upvotes: 1

Related Questions