ThoseKind
ThoseKind

Reputation: 804

Regex Matching in Python 2.7 for HTTP GET Headers

I am trying to form a regular expression that will match as follows:

As specified above, this is for HTTP GET requests so any of the following would work:

and the following would not:

I am currently using re.compile(r"^.{1,}: .{1,}[/r/n]$") but am not sure how to exclude colons from certain subsets of the string.

EDIT: I believe what I want to start with is ^ to signify the beginning of a string. Then, I want one or more number of any character except a colon so .{1,}, but I am not sure how I would exclude colon from this list. Then I want a colon and a space, so just :, and then any character except a colon .{1,} with the same problem as before of excluding colons. Finally, I want it to end with [\r\n]$. This still does not seem to work, even if I exclude the no colon character requirement. So something like ^.{1,}: .{1,}\r\n$, but I still need to figure out how to exclude colons.

Upvotes: 1

Views: 71

Answers (1)

Veltzer Doron
Veltzer Doron

Reputation: 974

  1. {1,} is simply +
  2. excluding colons is done by [^:]*
  3. If you want to exclude spaces and colons, use [^ :]
  4. catching end of string with $ following \r\n seems strange to me, it means a single string ending with an eoln and nothing after it (also I hope you know about the difference between unix and windows regarding this)
  5. Also: eoln is \r\n, putting something in square brackets means either of the characters contained will match which is not what you need

In total, the following should work

^([^ :]+): ([^ :]+)$

giving Host in group 1 and the url in group 2

Test it here

Upvotes: 1

Related Questions