Reputation: 3041
I have a cloud function that is subscribed to messages of a pubsub topic and scrapes the given url (which is part of the message) using puppeteer. I set the region to europe-west3
(my firebase project is based in this region as well) since I want to make sure that scraping is done on a server in this region (scraping the url from the US will yield different results). However, judging from the scraping output, the function is still run on a US server.
The Cloud Functions location documentation seems to imply that by setting the region I can determine the location where a given function runs.
I have also gone into the Firebase Console for my project and verified that the functions have the correct location.
Am I missing something here? Is it possible to specify the region where my web scraping logic should be executed?
exports.updateDocWithScrapingData = functions.runWith({ memory: '1GB', timeoutSeconds: 120 }).region('europe-west3').pubsub.topic('myTopic').onPublish(async (message) => {
const urlMap = JSON.parse(Buffer.from(message.data, 'base64').toString());
// Scraping is done using puppeteer in separate file
const scrapedData = await scraper.fetchData(urlMap['url']);
const docId = urlMap['id'];
const refDoc = db.collection('myCollection').doc(docId);
const doc = await refDoc.get();
if (doc.exists) {
// Update doc using scrapedData
} else {
// Create doc using scrapedData
}
return {
response: 'Success',
}
});
EDIT:
This is my scraping function in scraper.js:
const fetchData = async (url) => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
await page.goto(url);
await page.waitForTimeout(2000);
const games = await page.evaluate(() => {
const gamesArr = Array.from(document.querySelector('.grid-wrapper').querySelectorAll('.grid-event-wrapper'));
return gamesArr.map(game => game.innerText);
});
await browser.close();
console.log(games);
return games;
}
This is an example url I am trying to scrape: https://sports.bwin.de/de/sports/basketball-7/wetten/nordamerika-9/nba-6004
Expected output for europe-west3
(Function should be running in Frankfurt, Germany):
['Chicago Bulls
@
Indiana Pacers
Morgen / 01:05
KON…95
-5,5
1.87
▲ 224,5
1.91
▼ 224,5
1.91
2.85
1.44', 'Brooklyn Nets
@
Sacramento Kings
Morgen / 04:05
K…87
+3,5
1.95
▲ 242,5
1.87
▼ 242,5
1.95
1.57
2.45']
This is the actual output from the cloud function:
['Chicago Bulls
@
Indiana Pacers
Morgen / 01:05
KON…05
-5,5
-115
▲ 224,5
-110
▼ 224,5
-110
+185
-225', 'Brooklyn Nets
@
Sacramento Kings
Morgen / 04:05
K…15
+3,5
-105
▲ 242,5
-115
▼ 242,5
-105
-175
+145']
Notice the difference between the betting odds. US uses Money Line whereas in Germany for example you simply get the odds as a multiple of 1.
Upvotes: 0
Views: 450
Reputation: 1058
Yes, the thread exactly describes the reason why you are obtaining values from US servers.
All the IP addresses Google provides to GCP user are registered with ARIN
under the Google HQ in Mountain View, California (in other words SWIP to be
Mountain View, CA). So, all geolocations lookup of the IP addresses will
resolve to the United States even though the actual server is located
somewhere else. In addition to that, in GCP, it's not uncommon to remap a
block of IPs from one location to another, especially given the elasticity of
IP addresses for the GCP and they way they are recycled or reused.
Upvotes: 1