Spaceman
Spaceman

Reputation: 1205

Simple, one-liner solution to chain HTTP->SOCKS5 proxy

We run many parallel scrapers using local TOR proxies. So there's a list of SOCKS5 proxies, around 200 totally:

socks5://localhost:port socks5://localhost:port2 socks5://localhost:port3 ...

Some software does not work with SOCKS and works only with HTTP proxies. So we need to run some software that would act as a HTTP proxy but would redirect the requests to the SOCKS proxy then.

A traditional answer is to use Polipo\Vidalia but they both need to be configured and if you want to run 200 instances you must deal with 200 config files which is not so simple.

Another solution such as MITM proxy (Python) is fine, but it's too slow and eats too much RAM (just multiply every script by 200 - even if one eats 30 megs then it turns into 6 gigs of RAM taken).

Proxychains is ok but it still needs a config file for each instance.

A delegate program was fine but it stopped working for some strange reasons - it refuses to receive connections and returns something like "an intrusion attemt detected, going to stop" - restart does not help. It was run on a local interface, the webservice is ok and not hacked - so that behavior was really strange.

So we're looking for something like delegate but more reliable and without that errors. Something small, fast, preferable written in C\C++.

Or - any software solution in any scripting language (but it should be fast and memory-savvy).

I'm not a C programmer so if you're going to give me some 'examples' of the proxy code in C - it will not work, it will take me a day just to get into the code, compile it and run. Unfortunately =)

Thanks!

Upvotes: 1

Views: 1493

Answers (1)

jch
jch

Reputation: 5651

Polipo does not need a configuration file — it can read its configuration from the command line. So it's an easy matter to run 200 polipi from a shell script:

for ((i = 0; i < 100; i++)); do
    polipo deamonize=true diskCacheRoot='' proxyPort=$((i + 8100)) socksParentProxy=$(host$i) pidFile="/var/run/polipo$i.pid"
done

Note that the above disables the on-disk cache — sharing a single disk cache between many instances of Polipo is not supported — you should ask on the polipo-users mailing list if you need this functionality.

Polipo can be configured to run in just a few megabytes of memory (check the chunkHighMark variable), so running 200 instances should not be an isue.

Upvotes: 1

Related Questions