Reputation: 3280
Iam building a scraper which needs to scrape some web content. Iam facing an issue, the page I need to crawl has loads of java scripts and it seems that the java-script calls are setting up some cookies and some query string parameters for next requests.
Iam able to set the cookies by sending requests to the js files, but seems the query string params are getting generated by some encoded javascript calls.
I am not able to decipher them, I tried googling for tools to compile JS to C# but in vain. If someone has solved similar issues earlier, please shed some light on how can I compile a javascript file like a browser and generate html from my C# code directly.
Any help would be deeply appreciated.
Upvotes: 1
Views: 1125
Reputation: 498992
Why not use a web proxy like fiddler to find out what headers and cookies are setup and use this data directly in your C#?
That way you will not need to execute the JS just to figure out headers and cookies.
Update:
You can also use a web automation suite such as WatiN to crawl the site - I believe it already supports JS, so you don't need to do much more.
Update2:
Since WatiN is no good for your requirements, perhaps compiling it directly using a javascript to .NET compiler will be possible - see JScript.NET, though I doubt any DOM manipulation will result.
Upvotes: 4
Reputation: 9225
It may be more complicated than you think. Take a look at these two topics:
Any Javascript Engine for .NET/C#?
Embedding JavaScript engine into .NET
Upvotes: 1