Reputation: 375
I'm trying to build a Google Sheet that pulls information from the IRS Form 990 repository hosted via AWS S3.
Here is the XML file: Example 990 Form in XML
The query I'm doing is designed to pull the business names under the Schedule I section from the XML source. The business name is wrapped in the BusinessNameLine1Txt
tags.
Utilizing the built-in IMPORTXML
function from Google Sheet I've built the following:
=IMPORTXML("https://s3.amazonaws.com/irs-form-990/201702299349300445_public.xml", "//Return/ReturnData/IRS990ScheduleI/RecipientTable/RecipientBusinessName/BusinessNameLine1Txt")
When I execute the function with parameters seen above I receive an error saying that the imported content is empty. Is my XPATH query incorrect or does it have to do with some quirk in the data?
Upvotes: 1
Views: 476
Reputation: 201358
How about this modification?
=IMPORTXML(A1, "//*[local-name()='BusinessNameLine1Txt']")
https://s3.amazonaws.com/irs-form-990/201702299349300445_public.xml
is put in "A1".If I misunderstand your issue, please tell me. I would like to modify it.
=IMPORTXML(A1, "//*[local-name()='IRS990ScheduleI']//*[local-name()='BusinessNameLine1Txt']")
Upvotes: 3