Reputation: 11

Black Hat Python TCP Client

I'm working through the Black Hat Python book, and though it was written in 2015 some of the code seems a little dated. For example, print statements aren't utilizing parentheses. However, i cannot seem to get the below script to run, and keep getting an error.

    # TCP Client Tool

import socket

target_host = "www.google.com"
target_port = 80

# creates a socket object. AF_INET parameter specifies IPv4 addr/host. SOCK_STREAM is TCP specific, not UDP.
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# connect the client
client.connect((target_host, target_port))

# sending some data
client.send("GET / HTTP/1.1\r\nHost: google.com\r\n\r\n\")

# receive some data
response = client.recv(4096)

print(response)

The error i'm getting simply reads, File "", line 15 client.send("GET / HTTP/1.1\r\nHost: google.com\r\n\r\n\") ^

Upvotes: 1

Answers (2)

Torxed

Reputation: 23500

I think @Anonyme2000 answered the question in full and all the details needed to solve the issue are there. However, since this is a learning exercise from a book, others might come here and the details of what's going on in @Anonyme2000's answers are a bit short, I'll expand some more.

Strings

Python, like many other languages have what's called Escape Sequences, in short, putting \ infront of something means that - whatever follows will have a special meaning. Two examples:

Example 1: Row breaks (new-lines)

print("Something \nThis is a new line")

This will cause python to interpret n not as letter "n", but a special character indicating that "here there should be a new line", all thanks to \n being in-front of the letter n. \r is also a "new-line" but in older days it was the equivilent of moving the carriage printer head to the start of the line - not just down one line.

Example 2: Quote escapes in strings

print("I want to print this quote: \" in my string")

In this example, because we are using the quote character " to start and end our string, adding it in the middle would break the string (hopefully this is clear to you). In order to then proceed adding quotes in the middle of the text, we need to again, add a escape sequence character \ before the quote, this tells Python not to parse the quote as a quote, but simply add it into the string. There's an alternative to doing this, and that is:

print('I want to print this quote: " in my string')

And that's because the whole string is started and ended by ' instead, which enables Python to accurately guess (parse) start and stop of the actual whole string - which makes it 100% confident that the quote in this case - just just another piece of the string. These escape sequences are described here with more examples.

Bytes vs Strings

To better understand the difference, we'll first have a look at how Python and the terminal you use interact. I'm assuming you're running your python scripts from cmd.exe, powershell.exe or in Linux something like xterm or something. Basic terminals that is.

The terminal, will try to parse anything sent to it's output buffer and represent it to you. You can test this by doing:

print('\xc3\xa5\xc3\xa4\xc3\xb6') # Most Linux systems
print('\xe5\xe4\xf6') # Most Windows systems

In theory, one of the prints above should have let you just printed a bunch of bytes that the terminal some how knew how to render as åäö. Even your browser just did that for you (Fun side note, that's how they solve the Emoji-problem too, everyone's agreed that certain byte combinations should become 🙀). I say most windows and Linux, because this result is entirely up to what region/language you selected when you installed your operating system. I'm in EU North (Sweden) so my default codec in Windows is ISO-8859-1 and in all my Linux machines I have UTF-8. These codecs are important, as that's the machine-human interface in representing text.

Knowing this, anything you send to the output buffer of your terminal by doing either print("...") or sys.stdout.write("...") - will be interpreted by the terminal and rendered in your locale. If that's not possible, errors will occur.

This is where Python2 and Python3 starts to become two different beasts. And that's why you're here today. Putting it in simple terms, Python2 did a lot of automated and magic "guess-work" on strings, so that you could send a string to a socket - and Python would take care of the encoding for you. Python2 parsed them and converted them in all kinds of ways. In Python3 a lot of that automated guess-work was removed because it was more often than not confusing people. And the data being sent through functions and sockets was essentially a schrödingers data, it was strings some times and bytes some times. So instead, it's now up to you the developer to convert the data and encode it.. always.

So what is bytes vs strings?

bytes is in lay man terms, a string that hasn't been encoded in any way and thus can contain anything "data"-related. It doesn't have to be just a string (a-Z, 0-9, !"#¤% and so on), it can also contain special bytes like \x00 which is a Null byte/character. And Python will never try to auto-parse this data in Python3. And when doing:

print(b'\xe5\xe4\xf6')

Like above, except you define the string as a bytes string in Python3, Python will instead send a representation of the bytes not the actual bytes to the terminal buffer, thus, the terminal will never interpret them as the actual bytes they are.

Example 1: Encoding your data

Which brings us to this first example. So how do you convert your bytes containing print(b'\xe5\xe4\xf6') to the represented characters in your terminal, well, by converting it to a strings with a particular encoding. In the above example, the three characters \xe5\xe4\xf6 happens to be the ISO-8859-1 encoder in the making. I know this because I'm currently on windows and, if you run the command chcp in your terminal, you'll get which code page/encoder you're using.

There for, I can do:

print(b'\xe5\xe4\xf6'.decode('ISO-8859-1')

And that will convert the bytes objects into a string object (with a encoding).
The problem here, is that if you send this string to my Linux machine, it won't have a clue what's going on. Because, if you try:

print(b'\x86\x84\x94'.decode('UTF-8'))

You will end up with a error message like this:

>>> print(b'\x86\x84\x94'.decode('UTF-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 0: invalid start byte

This is because, in UTF-8 land, byte \x86 doesn't exists. So it has no way of knowing what to do with it. And because my Linux machine's default encoder is UTF-8 - your windows data is garbage to my machine.

Which, brings us to..

Sockets

In Python3 and most physical realms of a computer, encodings and strings are not welcome as they aren't really a thing. Instead, machines communicate in bits, in short, 1's and 0's. 8 of those becomes a byte, and that's where Python's bytes comes in to play. When sending something from machine to machine (or application to application), we will have to convert any text-representation, into a bytes sequence - so that the machines can talk to each other. Without encodings, without parsing things. Just - take the data.

We do this in three ways and they are:

print('åäö'.encode('UTF-8'))
print(bytes('åäö', 'UTF-8'))
print(b'åäö')

The last option, will fail - but I'll leave it like this on purpose to show the differences of telling Python, "hey, this weird thing, convert it to a bytes object".

All of these options, will return a bytes representation of åäö using a encoder *(except the last one, it will only encode using the ASCII parser, which is limited at best).

In the UTF-8 case, you will be returned something like:

b'\xc3\xa5\xc3\xa4\xc3\xb6'

And this, this is something you can send out on a socket. Because it's just a series of bytes, that the terminals, machines and applications won't touch or deal with in any other way than a series of ones and zeroes *('11000011 10100101 11000011 10100100 11000011 10110110' to be specific)

Together with some network logic, that's what's going to be sent out on your socket. And that's how machines communicate.

This is an overview of what's going on. The "human" is the terminal, aka, the machine-human-interface where you input your åäö and the terminal encodes/parses it as a certain encoding. Your application has to do magic in order to convert it to something the socket/physical world can work with.

Upvotes: 2

Anonyme2000

Reputation: 78

You are escaping " by putting a \ before, which means python does not know that the string ends here. You can notice that in your post, all the code after that line is coloured as if it was a string.

client.send also needs a byte-like object, not a string. You can specify that by putting a b before your string:

client.send(b"GET / HTTP/1.1\r\nHost: google.com\r\n\r\n")

After that the script works fine