Justin k
Justin k

Reputation: 1104

How to get Webpage’s source code in C++?

I am using Microsoft Visual Studio 2010 and C++ language that runs on the console.

I am trying to go to a web page, then get the source of that webpage (What I mean by source is: in Firefox, when you right click, then “View Page Source”) and save it in my computer as a text file, so that I can read that saved file later on. Can you please give me an example of how to go to a website in c++ and then save the HTML source code in to my computer? I would greatly appreciate any help

And how can you install libcurl?

When I use #include <curl/curl.h> it says Error: cannot open source file “curl/curl.h.”

Upvotes: 1

Views: 9145

Answers (4)

udit043
udit043

Reputation: 1620

This is a small program , i made to extract & save/write Facebook account source code in a text file. You can change it with your need ( you can change "http://www.facebook.com" with "http://www.google.com/").. Also remember to link wininet.a (library) to your project. Hope it will help :)

#include <windows.h>
#include <wininet.h>
#include <iostream>
#include <conio.h>
#include <fstream.h>
fstream fs_obj;
using namespace std;

int main(int argc, char *argv[])
{

  fs_obj.open("temp.txt",ios::out | ios::app);  
  HINTERNET hInternet = InternetOpenA("InetURL/1.0", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0 );

  HINTERNET hConnection = InternetConnectA( hInternet, "www.facebook.com", 80, " "," ", INTERNET_SERVICE_HTTP, 0, 0 ); //enter url here

  HINTERNET hData = HttpOpenRequestA( hConnection, "GET", "/", NULL, NULL, NULL, INTERNET_FLAG_KEEP_CONNECTION, 0 );

  char buf[ 2048 ] ;

  HttpSendRequestA( hData, NULL, 0, NULL, 0 ) ;
  string total;
  DWORD bytesRead = 0 ;
  DWORD totalBytesRead = 0 ;

  while( InternetReadFile( hData, buf, 2000, &bytesRead ) && bytesRead != 0 )
  {
    buf[ bytesRead ] = 0 ; // insert the null terminator.
    total=total+buf;
    printf( "%d bytes read\n", bytesRead ) ;

    totalBytesRead += bytesRead ;
  }

  fs_obj<<total<<"\n--------------------end---------------------\n";
  fs_obj.close();
  printf( "\n\n END -- %d bytes read\n", bytesRead ) ;
  printf( "\n\n END -- %d TOTAL bytes read\n", totalBytesRead ) ;

  cout<<endl<<total<<endl; //it will save source code to (temp.txt) file
  InternetCloseHandle( hData ) ;
  InternetCloseHandle( hConnection ) ;
  InternetCloseHandle( hInternet ) ;
  system("pause");
}

Rename temp.txt with temp.html , open it with browser you will get that webpage.

Upvotes: 0

Frunsi
Frunsi

Reputation: 7147

Learn to know the fundamentals about protocol layering and software layering first!

After that, decide on the layer on which you want to develop. Then decide on a low- or high-level API for your specific task.

BTW: Your specific task is not a typical task for C++, you may easily use the curl utility, e.g.: curl YOURURL > file.html. No need to re-invent the wheel.

Upvotes: 0

Bojan Komazec
Bojan Komazec

Reputation: 9526

You need to use some of tools that support HTTP e.g. WinINet (Windows) or libcurl (multiplatform). I was using WinINet for communication with web servers and getting the content of the page was pretty easy. Here are some links to give you a hint of what to do:

Get web page using WinInet class wrapper
Using WinInet as an alternative to libcurl

Upvotes: 1

mikithskegg
mikithskegg

Reputation: 816

A low level approach: winsockets + HTTP protocol.

Higher level approach: libraries curl, WinINet API and so on.

Upvotes: 0

Related Questions