Reputation: 316
I need to take out some unformatted numerical data from text file. In the textfile, the numbers are somewhere separated by single space and somewhere by multiple spaces, somewhere by tabs; pretty heterogeneous text :( I want Python to ignore all spaces/tabs and identify whole numerical values and put them in an array/list. Is it possible to do this using Python?
EDIT: There are many numbers written in scientific/exponential notation e.g. 1.2345E+06
, and Python does not recognize them as numbers. So \d
does not work simply :(
I don't want to use a normal string search for this purpose (given there are many strings/words which are of no interest/use). The regular expression module documentation has nothing mentioned about this issue.
Upvotes: 2
Views: 442
Reputation: 12343
If lines are like " 2.3e4 " or "2.6" or so, try:
^\s*?([+-]?\d+(\.\d+)?(e[+-]?\d+)?)\s*$
notice the \s*? mark (non-greedy zero/more spaces). Dont forget the question mark there - not including the question mark will make you capture only the last digit of your number due to greediness.
AFAIK python has not a special symbol, other than \d for digits, to capture numbers
Upvotes: 2