Vetruvius
Vetruvius

Reputation: 23

Use Google's libphonenumber with BaseX

I am using BaseX 9.2 to scrape an online phone directory. Nothing illegal, it belongs to a non-profit that my boss is a member in, so I have access to it. What I want is to add all those numbers to my personal phonebook so that I can know who is calling me (mainly to contact my boss). The data is in pretty bad shape, especially the numbers (about a thousand numbers, from all over the world). Some are in E164, some are not, some are downright invalid numbers.

I initially used OpenRefine 3.0 to cleanup the data. It also plays very nicely with Google's libphonenumber to whip the numbers in shape. It was as simple as downloading the JAR from Maven, putting it in OpenRefine's lib directory and then invoking Jython like this on each phone number (numberStr):

from com.google.i18n.phonenumbers import PhoneNumberUtil
from com.google.i18n.phonenumbers.PhoneNumberUtil import PhoneNumberFormat
pu = PhoneNumberUtil.getInstance()
numberStr = str(int(value))
number = pu.parse('+' + numberStr, 'ZZ')
try: country = pu.getRegionCodeForNumber(number)
except: country = 'US'
number = pu.parse(numberStr, (country if pu.isValidNumberForRegion(number, country) else 'US'))
return pu.format(number, PhoneNumberFormat.E164)

I discovered XPath and BaseX recently and find it to be very succint and powerful with HTML. While I could get OpenRefine to directly spit out a VCF, I can't find a way to plugin libphonenumber with BaseX. Since both are in Java, I thought it would be straight forward.

I tried their documentation (http://docs.basex.org/wiki/Java_Bindings), but BaseX does not discover the libphonenumber JAR out-of-the-box. I tried various path, renaming and location combinations. The only way I see is to write a wrapper and make it into an XQuery module (XAR) and import it. This will need significant time and Java coding skills and I definitely don't have the later.

Is there a simple way to hookup libphonenumber with BaseX? Or in general, is there a way to link external Java libs with XPath? I could go back to OpenRefine, but it has a very clumsy workflow IMHO. No way to ask the website admin to cleanup his act, either. Or, if OpenRefine and BaseX are not the right tools for the job, any other way to cleanup data, especially phone numbers? I need to do this every few months (for changes and updates on the site) and it's getting really tedious if I can't automate it fully. Would want at least a basic working code sample for an answer .. (I directly work off the standalone BaseX JAR on a Windows 10 x64 machine)

Upvotes: 1

Views: 163

Answers (1)

Andy Bunce
Andy Bunce

Reputation: 306

Place libphonenumber-8.10.16.jar in the folder ..basex/lib/custom to get it on the classpath (see http://docs.basex.org/wiki/Startup#Full_Distributions) and run bin/basexgui.bat

declare namespace Pnu="java:com.google.i18n.phonenumbers.PhoneNumberUtil";
declare namespace Pn="java:com.google.i18n.phonenumbers.Phonenumber$PhoneNumber";
let $pnu:=Pnu:getInstance()
let $pn:= Pnu:parse($pnu,"044 668 18 00","CH")
return Pn:getCountryCode($pn)

Returns the string "41"

There is no standard way to call Java from XPath, however many Java based XPath implementations provide custom methods to do this.

Upvotes: 0

Related Questions