Reputation: 669
So, I'm in quite a bit of a dilemma with this Google Apps Script. Being used to traditional Javascript this is quite the challenge. I'm currently trying to pull values from Zillow and I've been successful on the first couple of items (Rent Value, Zestimate, School Ratings) but now I need to get the School Names. This is becoming so much of a hassle that I'm honestly stuck I can't seem to do a .match()
on what I need to get. I'll post some code and see if anyone else can get a grasp on this.
The Zillow code I'm parsing:
<ul class="nearby-schools-list">
<li class="nearby-schools-header">
<h4 class="nearby-schools-rating"> </h4>
<h4 class="nearby-schools-name"> </h4>
<h4 class="nearby-schools-grades">Grades</h4>
<h4 class="nearby-schools-distance">Distance</h4>
</li>
<li class="nearby-school assigned-school">
<span class="gs-rating-badge">
<div class="gs-rating gs-rating-8">
<span class="gs-rating-number">8</span>
<span class="gs-rating-subtext">out of 10</span>
</div>
</span>
<span class="nearby-schools-name"> <a href="/seattle-wa/schools/salmon-bay-school-93956/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Salmon Bay School</a>
<span class="assigned-label de-emph">(assigned)</span>
</span>
<span class="nearby-schools-grades">K-8</span>
<span class="nearby-schools-distance">0.3 mi</span>
</li>
<li class="nearby-school assigned-school">
<span class="gs-rating-badge">
<div class="gs-rating gs-rating-8">
<span class="gs-rating-number">8</span>
<span class="gs-rating-subtext">out of 10</span>
</div>
</span>
<span class="nearby-schools-name"> <a href="/seattle-wa/schools/whitman-middle-school-93939/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Whitman Middle</a>
<span class="assigned-label de-emph">(assigned)</span>
</span>
<span class="nearby-schools-grades">6-8</span>
<span class="nearby-schools-distance">1.4 mi</span>
</li>
<li class="nearby-school assigned-school">
<span class="gs-rating-badge">
<div class="gs-rating gs-rating-9">
<span class="gs-rating-number">9</span>
<span class="gs-rating-subtext">out of 10</span>
</div>
</span>
<span class="nearby-schools-name"> <a href="/seattle-wa/schools/ballard-high-school-92363/" class="ga-tracked-link track-ga-event school-name notranslate" data-ga-action="School details click" data-ga-label="HDP AB Module" data-ga-category="Homes" data-ga-standard-href="true">Ballard High</a>
<span class="assigned-label de-emph">(assigned)</span>
</span>
<span class="nearby-schools-grades">9-12</span>
<span class="nearby-schools-distance">0.2 mi</span>
</li>
That is a large chunk but essentially I'm trying to grab the text out of school-name
which is a class listed under ul > li > span.nearby-schools-name > a.school-name
.
Here is my attempt and I'm getting returned blanked with anything I do.
// get School Names
var match = contentText.match(/<a href="([^<]*)" class="ga-tracked-link track-ga-event school-name notranslate" /g);
Browser.msgBox(match);
var schoolNameArray = new Array();
while (match.length > 0) {
var thisSchoolName = new String(schoolName.pop());
Browser.msgBox(thisSchoolName);
//schoolNameArray.push(thisSchoolName);
}
var schoolNames = schoolNameArray.toString().replace(/,/g, " _ ");
A quick FAQ, I have tried the function that is on the web that replicated the getElementsByClassName
and I had no luck. I also tried grabbing the href
Upvotes: 2
Views: 2681
Reputation: 31300
Here is one way to do it. First get all the Elements By Class Name:
var elSchoolNames = document.getElementsByClassName("nearby-schools-name");
What gets returned is an object. If you display the variable elSchoolNames
to the console, console.log('elSchoolNames: ' + elSchoolNames );
It will look like this:
[object HTMLCollection]
Inside the object [object HTMLCollection]
is a bunch of more objects; an array of objects.
[object HTMLHeadingElement]
[object HTMLSpanElement]
[object HTMLSpanElement]
[object HTMLSpanElement]
It's important to understand that the objects have key:value
pairs, but there is also an array of objects, with no key (property). To get sub objects out of the main object, refer to them by number, as they have no property name, because it's an array at that level.
You need all the Span Elements.
var theSpanEl = elSchoolNames[1];
var theSpanE2 = elSchoolNames[2];
var theSpanE3 = elSchoolNames[3];
console.log('textContent: ' + theSpanEl.textContent);
The name of the school is in the textContent
property of the object.
How do I know what all the objects are inside the first object, and what the contents of the first Span element is? I looped through all the properties of the objects.
var elSchoolNames = document.getElementsByClassName("nearby-schools-name");
console.log('namesOfSchools: ' + elSchoolNames);
for (theProperty in elSchoolNames) {
console.log('theProperties: ' + theProperty);
console.log('each value: ' + elSchoolNames[theProperty]);
};
var theSpanEl = elSchoolNames[1];
for (spanProperty in theSpanEl) {
console.log('theProperties: ' + spanProperty);
console.log('each value: ' + theSpanEl[spanProperty]);
};
console.log('textContent: ' + theSpanEl.textContent);
To get the sub element you need out every element after the first one. Because it's zero indexed, the second element is number 1.
var theSpanEl = elSchoolNames[1];
Now, to see what you have, print it to the console:
console.log('textContent: ' + theSpanEl.textContent);
That gives you:
textContent: Salmon Bay School
(assigned)
Of course, you'll want to strip off the (assigned)
on the end with a string method. You don't need to use .match()
or regEx for any of this.
I just realized, that if you are getting the HTML content out of a website that isn't yours, and the HTML content is a string, then none of this will work. Unless you injected the HTML into your site with innerHTML, then used the above code.
Upvotes: 2