Jason Reed
Jason Reed

Reputation: 63

I want to deploy my text scraping program to Heroku, but the file it uses is stored on my PC

I created a text scraping program in which the user enters a word and it searches through a large text file (250MG and growing) on my computer, but now I want to deploy it through Heroku.

Is there a workaround that I need to implement or is there a (rather elusive) way to accomplish this? As far as I can tell, there is no way to upload my text file to Heroku as is.

Upvotes: 0

Views: 120

Answers (1)

CaffeinatedMike
CaffeinatedMike

Reputation: 1607

Here's my suggestion.

  1. Host the text file on a site like pastebin as long as it doesn't contain any confidential information. This allows you to update it freely without needing to re-deploy your app each time you add to it.
  2. Once you've uploaded/pasted the text into a "paste" & save it you'll be able to get the "raw" link that will return the content of the file when requested.
  3. Use requests to fetch the file from your app & parse it however you need to.

    import requests
    resp = requests.get("https://pastebin.com/raw/LjcPg3UL")
    # if all entries are on individual lines
    mywords = [word for word in resp.iter_lines()]
    # if comma-separated or otherwise
    #mywords = resp.text.split(",")
    

Now you have all your content in a list to work with in your app.

Edit:
Since you want to accomplish this with larger files you could host the file on dropbox and follow the instructions from here to get the raw link. However, if you're dealing with that large of a file you're going to notice significant overhead. If the file is going to be that large, I'd suggest the added precaution of utilizing requests stream parameter (details), so the request line becomes

resp = requests.get("https://www.dropbox.com/s/FILE_ID/filename.extension?raw=1", stream=True)

This will read chunks of the file instead of reading the entire file at once, which will help cut down of memory consumption.

Upvotes: 1

Related Questions