Reputation: 316

Extract numbers with EXPONENTS from heterogeneous text file

I need to take out some unformatted numerical data from text file. In the textfile, the numbers are somewhere separated by single space and somewhere by multiple spaces, somewhere by tabs; pretty heterogeneous text :( I want Python to ignore all spaces/tabs and identify whole numerical values and put them in an array/list. Is it possible to do this using Python?

EDIT: There are many numbers written in scientific/exponential notation e.g. 1.2345E+06, and Python does not recognize them as numbers. So \d does not work simply :(

I don't want to use a normal string search for this purpose (given there are many strings/words which are of no interest/use). The regular expression module documentation has nothing mentioned about this issue.

Upvotes: 2

Answers (2)

Luis Masuelli

Reputation: 12343

If lines are like " 2.3e4 " or "2.6" or so, try:

^\s*?([+-]?\d+(\.\d+)?(e[+-]?\d+)?)\s*$

notice the \s*? mark (non-greedy zero/more spaces). Dont forget the question mark there - not including the question mark will make you capture only the last digit of your number due to greediness.

AFAIK python has not a special symbol, other than \d for digits, to capture numbers

Upvotes: 2

Unknown

Reputation: 5772

You could use a regular expression like \s+([-+]?\d*\.?\d+(?:[eE][-+]?\d+)?)\s+ (adapted from here). Take a look at this to see how you can search for a regular expression in a file.

Upvotes: 1

Extract numbers with EXPONENTS from heterogeneous text file

Answers (2)

Related Questions