Reputation: 55
I started learning python two days ago. Today I built a web scraping script which pulls data from yahoo finance and puts it in a csv file. The problem I have is that some values are string because yahoo finance displays them as such.
For example: Revenue: 806.43M
When I copy them into the csv I cant use them for calculation so I was wondering if it is possible to separate the "806.43" and "M" while still keeping both to see the unit of the number and put them in two different columns.
for the excel writing I use this command:
f.write(revenue + "," + revenue_value + "\n")
where:
print(revenue)
Revenue (ttm)
print(revenue_value)
806.43M
so in the end I should be able to use a command which looks something like this
f.write(revenue + "," + revenue_value + "," + revenue_unit + "\n")
where revenue_value is 806.43 and revenue_unit is M
Hope someone could help with the problem.
Upvotes: 2
Views: 141
Reputation: 2348
You can always try regular expressions.
Here's a pretty good online tool to let you practice using Python-specific standards.
import re
sample = "Revenue (ttm): 806.43M"
# Note: the `(?P<name here>)` section is a named group. That way we can identify what we want to capture.
financials_pattern = r'''
(?P<category>.+?):?\s+? # Capture everything up until the colon
(?P<value>[\d\.]+) # Capture only numeric values and decimal points
(?P<unit>[\w]*)? # Capture a trailing unit type (M, MM, etc.)
'''
# Flags:
# re.I -> Ignore character case (upper vs lower)
# re.X -> Allows for 'verbose' pattern construction, as seen above
res = re.search(financials_pattern, sample, flags = re.I | re.X)
Print our dictionary of values:
res.groupdict()
Output:
{'category': 'Revenue (ttm)',
'value': '806.43',
'unit': 'M'}
We can also use .groups() to list results in a tuple.
res.groups()
Output:
('Revenue (ttm)', '806.43', 'M')
In this case, we'll immediately unpack those results into your variable names.
revenue = None # If this is None after trying to set it, don't print anything.
revenue, revenue_value, revenue_unit = res.groups()
We'll use fancy f-strings to print out both your f.write()
call along with the results we've captured.
if revenue:
print(f'f.write(revenue + "," + revenue_value + "," + revenue_unit + "\\n")\n')
print(f'f.write("{revenue}" + "," + "{revenue_value}" + "," + "{revenue_unit}" + "\\n")')
Output:
f.write(revenue + "," + revenue_value + "," + revenue_unit + "\n")
f.write("Revenue (ttm)" + "," + "806.43" + "," + "M" + "\n")
Upvotes: 0
Reputation: 39930
I believe the easiest way is to parse the number as string and convert it to a float based on the unit in the end of the string.
The following should do the trick:
def parse_number(number_str) -> float:
mapping = {
"K": 1000,
"M": 1000000,
"B": 1000000000
}
unit = number_str[-1]
number_float = float(number_str[:-1])
return number_float * mapping[unit]
And here's an example:
my_number = "806.43M"
print(parse_number(my_number))
>>> 806430000.0
Upvotes: 2