neuromancer
neuromancer

Reputation: 55469

Parsing a website

I want to make a program that takes as user input a website address. The program then goes to that website, downloads it, and then parses the information inside. It outputs a new html file using the information from the website.

Specifically, what this program will do is take certain links from the website, and put the links in the output html file, and it will discard everything else.

Right now I just want to make it for websites that don't require a login, but later on I want to make it work for sites where you have to login, so it will have to be able to deal with cookies.

I'll also want to later on have the program be able to explore certain links and download information from those other sites.

What are the best programming languages or tools to do this?

Upvotes: 1

Views: 376

Answers (2)

Stephen
Stephen

Reputation: 49156

Python.

It's fairly easy to write a simple crawler using python's standard libs, but you'll also be able to find some existing python crawler libraries available on the web.

Upvotes: 1

Chad Birch
Chad Birch

Reputation: 74518

Beautiful Soup (Python) comes highly recommended, though I have no experience with it personally.

Upvotes: 3

Related Questions