Alpha
Alpha

Reputation: 399

Problem in reading text file with negative numbers

Text File: I have a text file containing more than 87,000 data points. The format of the text file is as follows:

There is no space between the two numbers highlighted in bold because of the leading - (minus) sign, because of which the resulting csv/pandas dataframe results in something like below.

Output:

| X Coordinate    | Y Coordinate    | Parameter 1     | Parameter 2     | 
| --------------  | --------------  | --------------- | ------------    |
| 2.744596610E-02 | 1.247197202E+00 | 7.121462841E-03 | 2.467938066E-05 |
| 2.732558411E-02 | 1.242196291E+00 | 1.365028508E-02 | 6.262368697E-05 |
| 2.713870635E-02 | 1.227254209E+00 | 1.958976965E-03-3.179617352E-06|  |

Required:

| X Coordinate    | Y Coordinate    | Parameter 1     | Parameter 2     | 
| --------------  | --------------  | --------------- | ------------    |
| 2.744596610E-02 | 1.247197202E+00 | 7.121462841E-03 | 2.467938066E-05 |
| 2.732558411E-02 | 1.242196291E+00 | 1.365028508E-02 | 6.262368697E-05 |
| 2.713870635E-02 | 1.227254209E+00 | 1.958976965E-03 |-3.179617352E-06 |

I am comfortable with python/pandas, so any of the programming techniques would be of great help.

Upvotes: 2

Views: 863

Answers (2)

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

A regex can put spaces in there:

import re

with open("current.txt") as fh, open("new.txt", "w") as gh:
    # skip the first line
    fh.readline()

    # for other lines..
    for line in fh:
       gh.write(re.sub(r"(E[+-]\d+)(\S)(\d|\.)", r"\1 -\3", line))

Then

# you can include the header, I didn't paste
df = pd.read_csv("new.txt", sep=" ", header=None)

gives me

>>> df

          0         1         2         3
0  0.027446  1.247197  0.007121  0.000025
1  0.027326  1.242196 -0.013650  0.000063
2  0.027139 -1.227254  0.001959 -0.000003

Upvotes: 1

Corralien
Corralien

Reputation: 120479

import re

DATAPOINT = re.compile(r'-?\d{1}\.\d{9}E[+-]\d{2}')

data = []
with open("data.txt") as fp:
    next(fp) # Ignore header (1st line)
    for l in fp.readlines():
        data.append(DATAPOINT.findall(l))

df = pd.DataFrame(data, columns=['X Coordinate', 'Y Coordinate', 'Parameter 1', 'Parameter 2'])
>>> df
      X Coordinate     Y Coordinate      Parameter 1       Parameter 2
0  2.744596610E-02  1.247197202E+00  7.121462841E-03   2.467938066E-05
1  2.732558411E-02  1.242196291E+00  1.365028508E-02   6.262368697E-05
2  2.713870635E-02  1.227254209E+00  1.958976965E-03  -3.179617352E-06

Upvotes: 1

Related Questions