Reputation: 416
How can I start the follow git application after compiling?
my steps are:
1. clone git repository "git://github.com/michaelmelanson/spider.git"
2. cd spider
3 erl
Erlang R14B04 (erts-5.8.5) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]
Eshell V5.8.5 (abort with ^G)
1> make:all().
up_to_date
2>
Finally how can I show modules related by application?
Thanks in advance.
Thanks for joining Michael, the standard task request "task_master:insert_task("http://www.id.uzh.ch")." working fine. But if I try to limit the recursive requests I receive an error message:
* 1: record task undefined
Unfortunately my suggestion below don't working!
rd(task, {url = "", depth = ""}).
Task = #task{url="http://www.id.uzh.ch", depth=2}.
task_master:insert_task(Task).
the next error message is:
=ERROR REPORT==== 21-Jun-2013::09:47:42 ===
** Generic server <0.52.0> terminating
** Last message in was {'$gen_cast',
{task,
{task,{task,"http://www.id.uzh.ch",2},[],-1}}}
** When Server state == {state}
** Reason for termination ==
** {{badmatch,{error,parse_url}},
[{fetcher,process_task,1},
{fetcher,handle_cast,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}`
Any ideas?
Upvotes: 0
Views: 970
Reputation: 2040
You don't have to start Erlang shell to compile your application's sources. You can just do
erlc src/*.erl -o ebin/
in your application's folder.
Also I would suggest you to try rebar
:
https://github.com/rebar/rebar
It's an utility which easily allows to compile and test Erlang applications.
Upvotes: 1
Reputation: 1335
I'm the original author of that code. Sorry I didn't document it at all... It was just a little side project of mine from about 5 years ago. So this is a bit of a distant memory to me, but here's what I know.
johlo is absolutely correct about how to start the application and insert a task. You should be able to start it with application:start(spider)
, then insert a new job with task_master:insert_task/1
method. It takes either a URL string, or a task
record. Let me know if that doesn't work for you.
Once the app is running, doing something like task_master:insert_task("http://someurl.com/page.html")
will insert a new task to fetch and process a web page. You can see what 'process' means exactly by looking here:
https://github.com/michaelmelanson/spider/blob/master/src/fetcher.erl#L113
Basically it will fetch the page, parse the HTML, extract any links and send the results back to the task_master
. The task_master
will then insert new tasks to process each link, recursively spidering all connected pages. Currently it doesn't do anything with the results, but this would be a good place to put that code:
https://github.com/michaelmelanson/spider/blob/master/src/fetcher.erl#L132
Be warned: by default it does not have a limit on the spidering depth. Left to its own devices, it will recursively spider the entire web. If you plan on using this on any site with an outgoing link, you should limit the spidering depth by creating a Task = #task{url="http://someurl.com/", depth=5};
then task_master:insert_task(Task)
.
Hope that helps.
Upvotes: 3
Reputation: 5500
Spider is an erlang application, so application:start/1
can be used to run it:
cd spider
erl -pa ebin
So erl finds the spider beam files
1> application:start(inets).
2> application:start(spider).
You can read more about applications.
Upvotes: 4