Reputation: 538
I am trying to use a python package inside shiny app to extract the maintext from a webpage: https://newspaper.readthedocs.io/en/latest/
what I mean by main text is the body of the article, without any adds, links, etc... (very similar to the "reader view" in safari on iphone).
To my knowledge, there is no similar package in r
, if you know one please let me know.
The goal of this app is to allow the user to insert a web address, click submit and get the clean text as output.
please find the code below as well as the error message. I am using rstudio cloud.
This is the error:
Using virtual environment 'python3_env' ...
Warning in system2(python, c("-c", shQuote(command)), stdout = TRUE, stderr = TRUE) :
running command ''/cloud/project/python3_env/bin/python' -c 'import sys; import pip; sys.stdout.write(pip.__version__)' 2>&1' had status 1
Warning in if (idx != -1) version <- substring(version, 1, idx - 1) :
the condition has length > 1 and only the first element will be used
Warning: Error in : invalid version specification ‘’, ‘ ’
52: stop
51: .make_numeric_version
50: numeric_version
49: pip_version
48: reticulate::virtualenv_install
47: server [/cloud/project/python in shiny.R#42]
Error : invalid version specification ‘’, ‘ ’
and this is the code:
# Python webpage scraper followed by r summary:
library(shiny)
library(reticulate)
ui <- fluidPage(
sidebarLayout(
sidebarPanel(
textInput("web", "Enter URL:"),
actionButton("act", "Submit")
),
mainPanel(br(),
tags$head(tags$style(HTML("pre { white-space: pre-wrap; word-break: keep-all; }"))),
verbatimTextOutput("nText"),
br()
)
)
)
server <- function(input, output){
#1) Add python env and packages:
reticulate::virtualenv_install('python3_env', packages = c('newspaper3k', 'nltk'))
py_run_string("from newspaper import Article")
py_run_string("import nltk")
py_run_string("nltk.download('punkt')")
#2) Pull the webpage url:
webad <- eventReactive(input$act, {
req(input$web)
input$web
})
observe({
py$webadd = webad
py_run_string("article = Article('webadd')")
py_run_string("article.download()")
py_run_string("article.parse()")
py_run_string("article.nlp()")
py_run_string("ztext =article.text")
py_run_string("r.ntexto = ztext")
output$nText <- renderPrint({
r.ntexto
})
})
}
shinyApp(ui = ui, server = server)
Upvotes: 0
Views: 324