RanonKahn
RanonKahn

Reputation: 872

How to extract the csv output displayed by solr query output using R or PowerShell?

I am trying to save data as a 'csv' format file or extract as a data frame from a solr query similar to this:

http://localhost:8983/solr/techproducts/select?q=ipod&fl=id,cat,name,popularity,price,score&wt=csv

I get a response as a web page output as shown below: enter image description here

I am able to copy, paste, save and retrieve the data in csv format from the csv output displayed in html file (webpage solr output) manually. However, I couldn't automate the process using R. I am neither able to parse and get the entire content by unlist -ing the output in the correct order nor extract and save the content displayed in the web page as a csv file. I tried solrium and httr packages in vain. I also tried to Export-Csv option in Windows PowerShell to save the data in a csv file format without any luck.

Requesting suggestions to overcome the issue.

Upvotes: 1

Views: 780

Answers (3)

RanonKahn
RanonKahn

Reputation: 872

For the benefit of novices like me I am sharing the solution.

The PowerShell approach:

Step 1: I used the following PowerShell code saved as .ps1 file.

curl 'https://some_site.com/solr/some_folder/select?q=*:*&wt=csv' -L -u username:password --location-trusted -b cookie-jar.txt > OutputFilename.csv

Step 2: I used the following R script to run the PowerShell file and read the csv file generated by PowerShell as a data frame for further processing.

system2("PowerShell", args=c("-file", "C:\\FolderName\\FileName.ps1"))
df <- as.data.frame(read.csv("OutputFilename.csv", header = TRUE, sep = ",", fill = TRUE, fileEncoding="UTF-16LE"))

Using the solrium package by Scott Chamberlain approach:

library(solrium)
solr_connect("https://site.organization.com", errors = "complete") 
solr_search(name = "studies", q = "metadata:value", rows=1000, wt = "csv", callopts = httr::authenticate("usrname", "pwd"))

Upvotes: 0

sckott
sckott

Reputation: 5903

This doesn't address anything specific about powershell, but as a start using solrium, does this work, and if not, how does it fail:

(to be fully reproducible), setup Solr

cd solr-6.6.0
bin/solr start -e cloud -noprompt
bin/solr create -c techproducts -d sample_techproducts_configs
bin/post -c techproducts example/exampledocs/*.xml 

In R:

solr_connect()
(df <- solr_search(
        name = "techproducts", 
        q = "ipod", 
        fl = c("id", "cat", "name", "popularity", "price", "score"), 
        wt = "csv"))
#> # A tibble: 3 x 6
#>                id                   cat                                       name popularity  price     score
#> *           <chr>                 <chr>                                      <chr>      <int>  <dbl>     <dbl>
#> 1           IW-02 electronics,connector             iPod & iPod Mini USB 2.0 Cable          1  11.50 3.2388113
#> 2 F8V7067-APL-KIT electronics,connector  Belkin Mobile Power Cord for iPod w/ Dock          1  19.95 2.3162508
#> 3       MA147LL/A     electronics,music Apple 60 GB iPod with Video Playback Black         10 399.00 0.9044058

write.csv(df, "myfile.csv", row.names = FALSE)

readLines("myfile.csv")
#> [1] "\"id\",\"cat\",\"name\",\"popularity\",\"price\",\"score\""                                                   
#> [2] "\"IW-02\",\"electronics,connector\",\"iPod & iPod Mini USB 2.0 Cable\",1,11.5,3.2388113"                      
#> [3] "\"F8V7067-APL-KIT\",\"electronics,connector\",\"Belkin Mobile Power Cord for iPod w/ Dock\",1,19.95,2.3162508"
#> [4] "\"MA147LL/A\",\"electronics,music\",\"Apple 60 GB iPod with Video Playback Black\",10,399,0.9044058"  

Upvotes: 1

Mark Wragg
Mark Wragg

Reputation: 23395

Per BenH's comment, its already in CSV format so no need to use Export-CSV. Just use the -OutFile parameter on Invoke-WebRequest:

Invoke-RestMethod "http://localhost:8983/solr/techproducts/select?q=ipod&fl=id,cat,name,popularity,price,score&wt=csv" -OutFile YourFile.csv

Upvotes: 2

Related Questions