Reputation: 744
I have strings which are incomplete URLs, like the following:
"dl_image/cm1111.jpg"
"dl_image/CM2222.jpg"
The problem is, the second is correct, the first is not. The letters between the numbers and 'dl_image/' must always be uppercase.
I am using joinging these incomplete urls onto a base url using urljoin with the following code:
imagehtml = temp1['dl_image']
if temp1.get('set') != None:
if imagehtml != None and imagehtml !='':
soup = Soup(imagehtml, 'html.parser')
for a in soup.find_all('a', href=True):
print(urljoin(base_url, a['href']))
imagehtml contains the incomplete urls.
Is there a way to convert only these letters to uppercase, and not the extension or directory?
Upvotes: 1
Views: 131
Reputation: 2159
Try This One:
import re
a = 'dl_image/cm12345/cm12.jpg'
b = len(a)-1-a[::-1].index('/')
c = a[b:]
d = re.findall(r"\d+",c)
if len(d)>0:
e = a.index(d[0], b)
f = a[:b+1]+a[b+1:e].upper()+a[e:]
print(f)
Output:
dl_image/cm12345/CM12.jpg
Upvotes: 0
Reputation: 10850
Perhaps you like
url = 'dl_image/cm1111.jpg'
path, file = url.rsplit('/', 1)
name, ext = file.rsplit('.', 1)
print(path + '/' + name.upper() + '.' + ext)
I.e. split only at the rightmost '/' and '.' to then uppercase only the portion in between these two positions.
Upvotes: 2
Reputation: 3211
I personally would recommend using str.rfind()
because the pattern in your cases will typically involve a .
near the very end as your file extension and a /
before the very end as your beginning of the conversion you are trying to do to your filename string. See the code below:
s="dl_image/cm2222.jpg"
start = s.rfind('/')
end = s.rfind('.')
new_s = s[:start] + s[start:end].upper() + s[end:]
print (new_s)
#dl_image/CM2222.jpg
Upvotes: 2
Reputation: 71560
Use a str.join
with list comprehension as parameter, doing conditions and doing expected stuff:
>>> s="dl_image/cm1111.jpg"
>>> ''.join([v.upper() if s[i-1]=='/' or s[i-2]=='/' else v for i,v in enumerate(s)])
'dl_image/CM1111.jpg'
>>>
Update:
imagehtml = temp1['dl_image']
if temp1.get('set') != None:
if imagehtml != None and imagehtml !='':
soup = Soup(imagehtml, 'html.parser')
for a in soup.find_all('a', href=True):
print(urljoin(base_url, ''.join([v.upper() if a['href'][i-1]=='/' or a['href'][i-2]=='/' else v for i,v in enumerate(a['href'])])))
Upvotes: 2
Reputation: 168
Yes, that's easy. Here's a way to do it:
/
.Here's a way you can translate it into code:
test_url = "dl_image/cm1111.jpg"
last_slash_index = test_url.rfind('/')
extension_start_index = test_url.rfind('.')
final_url = test_url[:last_slash_index+1] + test_url[last_slash_index+1:extension_start_index].upper() + test_url[extension_start_index:]
Upvotes: 1