Reputation: 194
Below is my piece of code.
import numpy as np
filename1=open(f)
xf = np.loadtxt(filename1, dtype=float)
Below is my data file.
0.14200E+02 0.18188E+01 0.44604E-03
0.14300E+02 0.18165E+01 0.45498E-03
0.14400E+02-0.17694E+01 0.44615E+03
0.14500E+02-0.17226E+01 0.43743E+03
0.14600E+02-0.16767E+01 0.42882E+03
0.14700E+02-0.16318E+01 0.42033E+03
0.14800E+02-0.15879E+01 0.41196E+03
as one can see there are negative values that take up the space between 2 values this causes numpy to give
ValueError: Wrong number of columns at line 3
This is just small snippet of my code. I want to read this data using numpy or pandas. Any suggestion would be great.
Edit 1:
@ZarakiKenpachi I used your suggestion of sep=' |-' but it gives me extra 4th column with NaN values.
Edit 2:
@Serge Ballesta nice suggestion but all these are some kind of pre-processing. I want some kind of inbuild function to do this in pandas or numpy.
Edit 3:
Important Note it should be noted that there also negative sign in 0.4373E-03
Thank-you
Upvotes: 1
Views: 179
Reputation: 149115
np.loadtext
can read from a (byte string) generator, so you can filter the input file while loading it to add an additional before a minus:
...
def filter(fd):
rx = re.compile(rb'\d-')
for line in fd:
yield rx.sub(b' -', line)
xf = np.loadtxt(filter(open(f, 'b')), dtype=float)
This does not require to preload everything into memory, so it is expected to be memory efficient.
The regex is required to avoid to change something like 0.16545E-012
.
In my tests for 10k lines, this should be at most 10% slower than loading everything in memory but will require far less memory
Upvotes: 2
Reputation: 5152
You can do a preprocess your data to add an additional space before your -
signs. While there are many ways of doing it, the best approach would be in my opinion (in order to avoid adding whitespaces at the start of the line) is using regex re.sub:
import re
with open(f) as file:
raw_data = file.readlines()
processed_data = re.sub(r'(?:\d)-', " -", raw_data)
xf = np.loadtxt(processed_data, dtype=float)
This replaces every -
preceded by a number with -
.
Upvotes: 2
Reputation: 763
Try the below code :
with open('app.txt') as f:
data = f.read()
import re
data_mod = []
for number in data.split('\n')[:-1]:
num = re.findall(r'[\w\.-]+-[\w\.-]',number)
for n in num:
number = number.replace('-',' -')
data_mod.append(number)
with open('mod_text.txt','w') as f:
for data in data_mod:
f.write(data+"\n")
filename1='mod_text.txt'
xf = np.loadtxt(filename1, dtype=float)
Actually you have to per-process the data, using regex. After that you can load that data as you required.
I hope this helps.
Upvotes: 0