Sanjana Balusu
Sanjana Balusu

Reputation: 1

Scraping 13F filings from SEC using R

I'm trying to scrape the data in the SEC FORM 13-F Information Table from the following link:

https://sec.report/Document/0001567619-21-010281/

I tried the below script:

library(timetk)
library(tidyverse)
library(rvest)
url <- "https://sec.report/Document/0001567619-21-010281/"
url <- read_html(url)
raw_data <- url %>%
  html_nodes("#table td") %>%
  html_text()

However, I'm unable to get the data components and under values, it says that raw_data is empty. Any help would be appreciated.

Upvotes: 0

Views: 595

Answers (2)

IgBell
IgBell

Reputation: 441

Use 13F from html page it is much easier here is an example

import pandas as pd
import requests
import numpy as np


# Makes a request to the url
url="https://www.sec.gov/Archives/edgar/data/1541617/000154161721000009/xslForm13F_X01/altcap13f3q21infotable.xml"
request = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})

# Pass the html response into read_html
tables = pd.read_html(request.text)
df = tables[3] 

Upvotes: -1

QHarr
QHarr

Reputation: 84465

The data is present in the response. You can use a CSS attribute = value selector to target the nested table. You will need to decide what to decide with the initial three rows which need to be transformed into a single header most likely (or not!)

library(rvest)
library(magrittr)

page <- read_html("https://sec.report/Document/0001567619-21-010281/")

table <- page %>%
  html_node('[summary="Form 13F-NT Header Information"]') %>%
  html_table(fill = T)

Upvotes: 0

Related Questions