Reputation: 1
I have a couple of questions. Just to explain what is going on in this code, I am taking the subscriber count of a youtube channel and trying to convert it to an int so that it can be multiplied, divided, etc.
Is there a way to put in something like ". followed by any three characters" in the .replace method. This is because some youtube channels have something like "3.04M" subscribers. When I extract that string from the HTML, I want to be able to turn it into an int. That is the first "if" statement, I am trying to say " if the sub count has a decimal followed by 3 characters ie; two numbers and the letter, then remove the decimal and replace the letters with the corresponding amount of zeros, according to the placement of the decimal. If there are NOT 3 characters after, I want to go to the first "else" which only lowers the value of the letters by a factor of 10, not 100 because of the decimal placement. Lastly, if there are no decimals, I simply want to convert the letters into the regular amount fo zeros.
I should probably point out that I am extremely new to python, only about 3 days working with it. Prior experience was like 10 hours of java that I have all but forgotten.
Thanks for any help that could be offered!
subC = self.driver.find_element_by_xpath('/html/body/ytd-app/div/ytd-page-manager/ytd-browse/div[3]/ytd-c4-tabbed-header-renderer/app-header-layout/div/app-header/div[2]/div[2]/div/div[1]/div/div[1]/yt-formatted-string')
print('subscriber count is: ' + str(subC.text))
if ".XXX" in subC.text:
subC.text.replace('k' , '0')
subC.text.replace('M' , '0000')
subC.tect.replace('B' , '0000000')
else:
if "." in subC.text:
subC.text.replace('k' , '00')
subC.text.replace('M' , '00000')
subC.text.replace('B' , '00000000')
subC.text.replace('.' , '')
else:
subC.text.replace('k' , '000')
subC.text.replace('M' , '000000')
subC.text.replace('B' , '000000000')
(realSub, other) = subC.text.split(maxsplit=1)
print(int(realSub))
Upvotes: 0
Views: 105
Reputation: 6132
Using regex and dictionaries you can achieve what you're looking for:
import re
d = {'M': 1000000, 'k': 1000, 'B': 1000000000}
subC = ['3.04M', '5M', '3.4k']
for sub in subC:
if re.search('([a-zA-z])', sub ):
match = re.search('([a-zA-z])', sub ).group(1) #Get the M
subC2 = float(sub .replace(match,'')) # Remove the M and turn it into a float
sub_number = int(subC2*d.get(match)) # Use dictionary to convert it to millions
else:
sub_number = int(subC)
print(sub_number)
Maybe I missed one of your cases, please let me know if that happened or if you didn't understand something. This will work only if your string is the sub count, if that's not the case, some modifications might me needed.
3040000
5000000
3400
Upvotes: 1
Reputation: 191
You can use regular expressions to do that. If I understood correctly, the numbers can come in these formats (with k, M or B):
To match the ".XXX" format of the first case you can use
import re
if bool(re.search('\.[0-9][0-9].', subC)):
subC = subC.text.replace('.','')
subC = subC.text.replace('k' , '0')
subC = subC.text.replace('M' , '0000')
subC = subC.text.replace('B' , '0000000')
else:
if "." in subC.text:
subC = subC.text.replace('k' , '00')
subC = subC.text.replace('M' , '00000')
subC = subC.text.replace('B' , '00000000')
subC = subC.text.replace('.' , '')
else:
subC = subC.text.replace('k' , '000')
subC = subC.text.replace('M' , '000000')
subC = subC.text.replace('B' , '000000000')
subC = int(subC)
Notice that you need to explicitly assign the string where you replaced something to your original variable, as it does not get saved automatically.
As a little extra, the regular expression works as follows:
Upvotes: 0
Reputation: 11
Try this
realsub = subC.text
realsub.casefold()
if realsub[-1].isalpha():
last = realsub[-1]
num = 1000 if last=='k' else 1000000 if last=='m' else 1000000000
realsub = int(float(realsub[:-1])*num)
print(realsub)
The casefold
converts the string to lowercase. If the last character is alphabet the number is multiplied by the required integer num
.
Upvotes: 1