Reputation: 618
To do this, I'd normally write a function that pulls one field at a time from the input string, and then loop until the input string is empty.
But there must be a more pythonic way of doing it that splits everything up at once.
Fields in the input string are separated by a space, and fields that contain spaces are enclosed by quotation marks. Quoted fields do not contain quotation marks.
An real example of this format is a web server's access_log file:
216.244.66.234 - - [01/Nov/2019:19:20:07 +0000] "GET /robots.txt HTTP/1.1" 200 67 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])"
EDIT:
access_log was a bad choice as an example, as it contains a bracket-delimited field that contains a space.
But since there is a simple solution to my original question (shlex.split()), I'll revise this question to include processing the bracketed field too (again with no internal delimiter character).
What I'm looking for is an example of parsing a string into fields in a way other than using a function to pull one token out of the string at a time.
Upvotes: 0
Views: 34
Reputation: 61910
IUUC, you could use shlex.split:
from shlex import split
s = '216.244.66.234 - - [01/Nov/2019:19:20:07 +0000] "GET /robots.txt HTTP/1.1" 200 67 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])"'
for field in split(s):
print(field)
Output
216.244.66.234
-
-
[01/Nov/2019:19:20:07
+0000]
GET /robots.txt HTTP/1.1
200
67
-
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])
Upvotes: 2