Reputation:
Is there a way to extract float numbers, which are in different and unknown formats and are concatenated with no delimiter into a string like this:
"3.01-1.58e+006-1.58e+006"
I need to extract below number for the above string:
3.01 -1.58e+006 -1.58e+006
Note: the length/format of the numbers are variable; we do not know how many numbers are there in the string.
Upvotes: 1
Views: 61
Reputation: 3591
Any time you're extracting data, you're going to be making assumptions about what form the data is in, and then telling the computer to look for patterns based on those assumptions. Getting the right assumptions can be as important as getting the code right for the assumptions that you have chosen. In this case, one assumption you might make is that each number consists of one digit followed by a decimal place, followed by some more digits followed by "e", followed by either "+" or "-", and then followed by more digits. If you know how long each set of digits will be, you can split on those length. The length most likely to be consistent is the number of digits before the decimal place; if the numbers are in scientific notation, then there will only be one digit. However, there might also be a minus sign before that digit. So you can go through the string, and check whether you have: (next character is + or -, and current+3 is .) or (current+2 is .); every time that occurs, you get another number.
number_list = [None]
beginning_of_current_number = 0
for index in range(len(str)-3):
if (str[index+1] in ["+","-"] & str[index+3] == "."):
number_list.append(float(str[beginning_of_current_number:index+1]))
beginning_of_current_number = index+1
elsif (str[index+2] == "." & beginning_of_current_number != index-1):
number_list.append(float(str[beginning_of_current_number:index+1]))
beginning_of_current_number = index+1
#the above won't get the last number, so
number_list.append(float(str[beginning_of_current_number:-1]))
Upvotes: 0
Reputation: 54233
This regex isn't pretty but it seems to work for your example:
((?:^|[\+\-])[\d\.]+(?:e[\+\-]\d+)?)
It means : start of string or a sign, followed by digits and dots, possibly followed by e
followed by a sign and digits.
>>> import re
>>> re.findall("((?:^|[\+\-])[\d\.]+(?:e[\+\-]\d+)?)","3.01-1.58e+006-1.58e+006")
['3.01', '-1.58e+006', '-1.58e+006']
Upvotes: 1