Reputation: 37
I am trying to parse data from the following website (https://www.fundsquare.net/security/information?idInstr=275136)
I want to display the price of the fund in Google sheets. However, when using the 'importxml' function I get an error that the "imported content is empty". Anybody know what I can do to fix it?
Ways I tried the function:
=IMPORTXML("https://www.fundsquare.net/security/summary?idInstr=275136" ,"//*[@class ='surligneorange']" )
=IMPORTXML("https://www.fundsquare.net/security/information?idInstr=275136" , "//*[@id='content']/table[2]/tbody/tr/td[3]/span[1]")
=IMPORTXML("https://www.fundsquare.net/security/information?idInstr=275136" , "//*[@id='content']//span[1]")
I keep on getting the same error. When looking for this error I get the difference between static and dynamic data. This data changes so I guess its dynamic but i'm not sure how that would impact the formula.
I have been trying some things with script editor but no success. Also trying something with RegExp but couldn't get any further than the examples. My knowledge of scraping is limited so any tips and tricks when trying to parse data is greatly appreciated! Any help would be greatly appreciated!
Edit: Within script editor I tried the following code:
function importdata() {
var found, html, content = '';
var response = UrlFetchApp.fetch("https://www.fundsquare.net/security/information?idInstr=275136");
if (response) {
html = response.getContentText();
if (html) content = html.match(/<span class="surligneorange">(.*)<\/span>/)[0];
}
Logger.log(content);
}
This gives me the following log output:
[20-06-05 07:44:58:529 PDT] <span class="surligneorange">31.15 EUR</span> <span style="color:#DD0000;text-align:left;padding:4px 0;"> -0.67 % <img src="/images/share/variationNegative.gif" style="vertical-align:middle;"/></span></td></tr></table><div id="onglet"><a href="/security/documents?idInstr=275136">Documents</a><a href="/security/eusd?idInstr=275136">Taxes</a><a href="/security/histo-divid?idInstr=275136">Dividends</a><a href="/security/histo-prices?idInstr=275136">Hist. Prices</a><a href="/security/price?idInstr=275136">Price</a><a href="/security/order-ref-data?idInstr=275136">Order Ref. Data</a><a class="selected" href="/security/information?idInstr=275136">Security Information</a><a href="/security/summary?idInstr=275136">Overview</a><br class="clear_r"/></div><div id="blocresume"><table class="portlet100pct" border="0" cellspacing="0" cellpadding="0"><tr><td valign="top" class="portletleft50pct"><table width="100%" border="0" cellspacing="0" cellpadding="0"><tr><td valign="top" class="portletBordGris"><div style="position: relative; left: 1px;" class="bloctitle"><img src="/Fundsquare/images/share/x.gif" border="0" height="1" width="1" /></div><DIV class="bloctitle" style="position: relative; top: -21px; right: 1px;"><span style="top: 3px;" >General information</span>
The value 31.15 is what I want to scrape. How can I get this value in my spreadsheet?
Edit 06/06 10:14: further questions
Could you please help me understand what you changed. What exactly is the difference between what I tried to match and what you matched.
mine:
if (html) content = html.match(/<span class="surligneorange">(.*)<\/span>/)[0];
yours:
if (html) content = html.match(/<span class="surligneorange">([\d.]*).*?<\/span>/)[1];
and:
if (html) content = html.match(/<span class="surligneorange">([\d.]*).*<\/span>/)[1];
What is the difference between my [0]
and your [1]
. is it that you only request the first value?
What is the difference between my .*
and your ([\d.]*).*
or [\d.]*).*?
???
My knowledge from javascript is not so good so I am unsure what it does. Thanks for the help!
Upvotes: 1
Views: 2994
Reputation: 201613
How about this answer?
Unfortunately, when I saw the HTML and your formulas, I thought that the value might not be directly retrieved by IMPORTXML
. So I think that your approach to use Google Apps Script can be used for retrieving the value you expect. But I think that your script has the modification points a little for achieving it. In this answer, I would like to propose the modification points of your Google Apps Script.
In your case, I think that the method of match
and the regex are required to be modified.
When your script is modified, please modify as follows.
From:if (html) content = html.match(/<span class="surligneorange">(.*)<\/span>/)[0];
To:
if (html) content = html.match(/<span class="surligneorange">([\d.]*).*?<\/span>/)[1];
31.15
.<span class="surligneorange">([\d.]*).*<\/span>
can be also used.As the modified script for putting to a cell in Google Spreadsheet, from your situation, I thought that you might want to use this as the custom function. If it's so, how about the following modified script?
When your script is used as the custom function, please rename the function name from importdata
to others. Because importdata
has already been used as the built-in function.
function sample() {
var found, html, content = '';
var response = UrlFetchApp.fetch("https://www.fundsquare.net/security/information?idInstr=275136");
if (response) {
html = response.getContentText();
if (html) content = html.match(/<span class="surligneorange">([\d.]*).*<\/span>/)[1];
}
return content;
}
=sample()
is put to a cell, the value of 31.15
is put to the cell. If you want to put the value as the number, please modify return content
to return Number(content)
.For your additional question, I added the answers as follows.
In your script, html.match(/<span class="surligneorange">(.*)<\/span>/)[0];
returns the full match. When you want to retrieve the group, in this case, it's html.match(/<span class="surligneorange">(.*)<\/span>/)[1];
. But, in this regex, the retrieved value is 31.15 EUR</span>.....General information</span>
. I thought that the reason of your issue is this.
In order to retrieve the value of 31.15 EUR
, .*
is modified to .*?
. By this, the matched value is from <span class="surligneorange">31.15 EUR</span>.....General information</span>
to <span class="surligneorange">31.15 EUR</span>
.
But in your goal, you want to retrieve only 31.15
. So I used ([\d.]*).*
. By this, 31.15
of <span class="surligneorange">31.15 EUR</span>.....General information</span>
is retrieved. When ([\d.]*).*?
is used, 31.15
of <span class="surligneorange">31.15 EUR</span>
is retrieved.
So, in the case of <span class="surligneorange">([\d.]*).*?<\/span>
, <span class="surligneorange">31.15 EUR</span>
is matched. And ([\d.]*)
is 31.15
.
In the case of <span class="surligneorange">([\d.]*).*<\/span>
, <span class="surligneorange">31.15 EUR</span>.....General information</span>
is matched. And ([\d.]*)
is 31.15
.
Upvotes: 2