Reputation: 399
Text File: I have a text file containing more than 87,000 data points. The format of the text file is as follows:
There is no space between the two numbers highlighted in bold because of the leading - (minus) sign, because of which the resulting csv/pandas dataframe results in something like below.
Output:
| X Coordinate | Y Coordinate | Parameter 1 | Parameter 2 |
| -------------- | -------------- | --------------- | ------------ |
| 2.744596610E-02 | 1.247197202E+00 | 7.121462841E-03 | 2.467938066E-05 |
| 2.732558411E-02 | 1.242196291E+00 | 1.365028508E-02 | 6.262368697E-05 |
| 2.713870635E-02 | 1.227254209E+00 | 1.958976965E-03-3.179617352E-06| |
Required:
| X Coordinate | Y Coordinate | Parameter 1 | Parameter 2 |
| -------------- | -------------- | --------------- | ------------ |
| 2.744596610E-02 | 1.247197202E+00 | 7.121462841E-03 | 2.467938066E-05 |
| 2.732558411E-02 | 1.242196291E+00 | 1.365028508E-02 | 6.262368697E-05 |
| 2.713870635E-02 | 1.227254209E+00 | 1.958976965E-03 |-3.179617352E-06 |
I am comfortable with python/pandas, so any of the programming techniques would be of great help.
Upvotes: 2
Views: 863
Reputation: 18315
A regex can put spaces in there:
import re
with open("current.txt") as fh, open("new.txt", "w") as gh:
# skip the first line
fh.readline()
# for other lines..
for line in fh:
gh.write(re.sub(r"(E[+-]\d+)(\S)(\d|\.)", r"\1 -\3", line))
Then
# you can include the header, I didn't paste
df = pd.read_csv("new.txt", sep=" ", header=None)
gives me
>>> df
0 1 2 3
0 0.027446 1.247197 0.007121 0.000025
1 0.027326 1.242196 -0.013650 0.000063
2 0.027139 -1.227254 0.001959 -0.000003
Upvotes: 1
Reputation: 120479
import re
DATAPOINT = re.compile(r'-?\d{1}\.\d{9}E[+-]\d{2}')
data = []
with open("data.txt") as fp:
next(fp) # Ignore header (1st line)
for l in fp.readlines():
data.append(DATAPOINT.findall(l))
df = pd.DataFrame(data, columns=['X Coordinate', 'Y Coordinate', 'Parameter 1', 'Parameter 2'])
>>> df
X Coordinate Y Coordinate Parameter 1 Parameter 2
0 2.744596610E-02 1.247197202E+00 7.121462841E-03 2.467938066E-05
1 2.732558411E-02 1.242196291E+00 1.365028508E-02 6.262368697E-05
2 2.713870635E-02 1.227254209E+00 1.958976965E-03 -3.179617352E-06
Upvotes: 1