Benoit Jansen
Benoit Jansen

Reputation: 33

Realtime communication between nodejs and phantomjs or any other headless browser

I'm developping an app in nodejs that combines multiple intranet sites into 1. So far, I used requestjs to make requests to obtain what I need. I'm a bit stuck on how to do realtime communication between the nodejs (with express) and a particular site that has a captcha login. I'm thinking about a headless browser that forwards the captcha to my ui, but I don't know how to start. Is there any GOOD and up-to-date tutorial?

Upvotes: 1

Views: 315

Answers (1)

alandarev
alandarev

Reputation: 8635

Real-time communication is a buzz for marketing people, that is also the reason you feel lost.

If I got you right, you have a node.js server which aggregates several sites being scrambled in the same time.

Here is the solution on paper (it will take some effort to code this all - a task for you):

(Let site A have captcha)

  1. Client connects to node.js server
  2. Node.js server runs the phantomjs script from command line (Child Process manual will help)
  3. The scripts scramble. Site A scrambler receives captcha and cookies / form with some unique values for that captcha. Script needs to save the state of cookies and a form in some temporary text file. Save the captcha image into temporary file. Exit
  4. Node.js checks if new temporary captcha images were created in a given folder. If they were - display to the user
  5. User inputs the captcha, sends solution back to the Node
  6. If the temporary image was named Captcha_Site_A.png then save the solution to Captcha_Site_A.txt.
  7. Run the Site A Scrambler Part 2.
  8. The Part 2 searches for the solution text file created, loads back the state of cookies and form, puts the solution into the form, and proceeds.
  9. Node.js then receives site content.

Yes, it is a long journey, but you get what you ask :)

P.S. Step 9 of receiving website can be achieved by: making Phantom script print the results to the stdout, and make Node.js catch the output (Look again into Child Process documentation). Alternatively save the results into the temporary file.

Upvotes: 1

Related Questions