Jothi
Jothi

Reputation: 183

Python script for URL split

I'm new to python,learning the basics.

My Query : I have multiple pages accessed as a request from a log file like the below,

"GET /img/home/search-user-ico.jpg HTTP/1.1"  
"GET /SpellCheck/am.tlx HTTP/1.1"
"GET /img/plan-comp-nav.jpg HTTP/1.1" 
"GET /ie6.css HTTP/1.1"
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1"
"GET /SpellCheck/am100k2.clx HTTP/1.1" 
"GET /SpellCheck/am.tlx HTTP/1.1" 

My question is i want only the file part from the page, For example, Let us consider "GET /img/home/search-user-ico.jpg HTTP/1.1" ,"GET /ie6.css HTTP/1.1" as a page then from the above i want to split search-user-ico.jpg HTTP, ie6.css HTTP.

so experts please help me in writing the python script for the above to split.

Upvotes: 1

Views: 3016

Answers (4)

Vihtinsky
Vihtinsky

Reputation: 43

data = [
"GET /img/home/search-user-ico.jpg HTTP/1.1",
"GET /SpellCheck/am.tlx HTTP/1.1",
"GET /img/plan-comp-nav.jpg HTTP/1.1" ,
"GET /ie6.css HTTP/1.1",
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1",
"GET /SpellCheck/am100k2.clx HTTP/1.1" ,
"GET /SpellCheck/am.tlx HTTP/1.1" 
]

for url in data:
    print url.split(' ')[1].split('/')[-2]

Upvotes: 1

omerkirk
omerkirk

Reputation: 2527

If the format of your links is similar. Another solution would be:

request = "GET /img/home/search-user-ico.jpg HTTP/1.1"
parts = request.split("/")
parts[-2] //returns search-user-ico.jpg HTTP

Upvotes: 0

Achim
Achim

Reputation: 15722

data = [
"GET /img/home/search-user-ico.jpg HTTP/1.1",
"GET /SpellCheck/am.tlx HTTP/1.1",
"GET /img/plan-comp-nav.jpg HTTP/1.1" ,
"GET /ie6.css HTTP/1.1",
"GET /img/portlet/portlet-content-bg.jpg HTTP/1.1",
"GET /SpellCheck/am100k2.clx HTTP/1.1" ,
"GET /SpellCheck/am.tlx HTTP/1.1" 
]

for url in data:
    print url.split(' ')[1].split('/')[-1]

Upvotes: 0

Stephen Paulger
Stephen Paulger

Reputation: 5343

Assuming that you don't have spaces in the filenames and that you don't want "HTTP" at the end.

You can split the line by space.

parts = line.split(" ")

and then use the os module to get the filename from the path.

filename = os.path.basename(parts[1])

For example.

>>> line = "GET /img/home/search-user-ico.jpg HTTP/1.1"
>>> parts = line.split(" ")
>>> parts[1]
'/img/home/search-user-ico.jpg'
>>> os.path.basename(parts[1])
'search-user-ico.jpg'

Upvotes: 3

Related Questions