Reputation: 875
I have two different kinds of URLs in a list:
The first kind looks like this and starts with the word 'meldung':
meldung/xxxxx.html
The other kind starts with 'artikel':
artikel/xxxxx.html
I want to detect if a URL starts with 'meldung' or 'artikel' and then do different operations based on that. To achieve this I tired to use a loop with if and else conditions:
for line in r:
if re.match(r'^meldung/', line):
print('je')
else:
print('ne')
I also tried this with line.startswith():
for line in r:
if line.startswith('meldung/'):
print('je')
else:
print('ne')
But both methods dont work since the strings I am checking dont have any whitespaces.
How can I do this correctly?
Upvotes: 0
Views: 38
Reputation: 7206
you can do it using regex:
import re
def check(string):
if (re.search('^meldung|artikel*', string)):
print("je")
else:
print("ne")
for line in r:
check(line)
Upvotes: 1
Reputation: 477
You can just use the following, if the links are stored as strings within the list:
for line in r:
if ‘meldung’ in line:
print(‘je’)
else:
print(‘ne’)
Upvotes: 1
Reputation: 1166
What about this:
r = ['http://example.com/meldung/page1.html', 'http://example.com/artikel/page2.html']
for line in r:
url_tokens = line.split('/')
if url_tokens[-2] == 'meldung':
print(url_tokens[-1]) # the xxxxx.html part
elif url_tokens[-2] == 'artikel':
print('ne')
else:
print('something else')
Upvotes: 1