tobias sch
tobias sch

Reputation: 359

How can I remove specific characters from a list and save as data frame?

I would like to

  • get a clean data set
  • without special characters
  • only with actual words
  • no numbers
  • that comes from a JSON file
  • .

     library(rvest); library(XML); library(dplyr);library(tidyr); library(purrr); library(rjson)
    
     url <- "http://suggestqueries.google.com/complete/search?client=chrome&q=Nike"
     nike_autocomplete <- read_html(url)
    

    The output should look like this:

    [1] "Nike" "nike air" "nike air max" "nike schuhe" "nike air force" "nike air max 97"
    [7] "nike tn" "nike id" "nike air max 270" "nike vapormax" "nike pullover" "nike schweiz"   
    [13] "nike 97" "nike off white" "nike air max plus" "nike winterschuhe" "nike schuhe damen" "nike huarache"  
    [19] "nike shoes" "nike logo" "nike air max 90"
    

    Thus, no empty things at the end

    Upvotes: 0

    Views: 45

    Answers (1)

    Gregor Thomas
    Gregor Thomas

    Reputation: 146030

    The text you're trying to extract is in a JSON format, so you'll be much better off using a json-reading utility rather than trying to use regex. I like jsonlite::fromJSON for this.

    library(rvest)
    library(jsonlite)
    library(purrr)
    
    url <- "http://suggestqueries.google.com/complete/search?client=chrome&q=Nike"
    read_html(url) %>%
      xml_text %>%
      fromJSON() %>%
      extract(1:2) %>%
      unlist
    #  [1] "Nike"                  "nike shox"             "nike shoes"            "nike air max"         
    #  [5] "nike outlet"           "nike air force 1"      "nike basketball shoes" "nike vapormax"        
    #  [9] "nike air max 97"       "nike id"               "nike store"            "nike stock"           
    # [13] "nike air max 270"      "nike promo code"       "nike windbreaker"      "nike sweatshirts"     
    # [17] "nike huarache"         "nike hoodie"           "nike cortez"           "nike sweatpants"      
    # [21] "nike slides"      
    

    Upvotes: 1

    Related Questions