deadsix
deadsix

Reputation: 375

Importing XML Data into Google Sheets from a Web-based Source

I'm trying to build a Google Sheet that pulls information from the IRS Form 990 repository hosted via AWS S3.

Here is the XML file: Example 990 Form in XML

The query I'm doing is designed to pull the business names under the Schedule I section from the XML source. The business name is wrapped in the BusinessNameLine1Txttags.

Utilizing the built-in IMPORTXML function from Google Sheet I've built the following:

=IMPORTXML("https://s3.amazonaws.com/irs-form-990/201702299349300445_public.xml", "//Return/ReturnData/IRS990ScheduleI/RecipientTable/RecipientBusinessName/BusinessNameLine1Txt")

When I execute the function with parameters seen above I receive an error saying that the imported content is empty. Is my XPATH query incorrect or does it have to do with some quirk in the data?

Upvotes: 1

Views: 476

Answers (1)

Tanaike
Tanaike

Reputation: 201358

How about this modification?

=IMPORTXML(A1, "//*[local-name()='BusinessNameLine1Txt']")
  • https://s3.amazonaws.com/irs-form-990/201702299349300445_public.xml is put in "A1".

Result:

enter image description here

Reference:

If I misunderstand your issue, please tell me. I would like to modify it.

Edit:

=IMPORTXML(A1, "//*[local-name()='IRS990ScheduleI']//*[local-name()='BusinessNameLine1Txt']")

Result:

enter image description here

Upvotes: 3

Related Questions