Reputation: 189
I would like to read in this file (test.txt)
01.06.2015;00:00:00;0.000;0;-9.999;0;8;0.00;18951;(SPECTRUM)ZERO(/SPECTRUM)
01.06.2015;00:01:00;0.000;0;-9.999;0;8;0.00;18954;(SPECTRUM)ZERO(/SPECTRUM)
01.06.2015;00:02:00;0.000;0;-9.999;0;8;0.00;18960;(SPECTRUM)ZERO(/SPECTRUM)
01.06.2015;09:23:00;0.327;61;25.831;39;29;0.18;19006;01.06.2015;09:23:00;0.327;61;25.831;39;29;0.18;19006;(SPECTRUM);;;;;;;;;;;;;;1;1;;;1;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;1;;;;;;;;;;;;(/SPECTRUM)
01.06.2015;09:24:00;0.000;0;-9.999;0;29;0.00;19010;(SPECTRUM)ZERO(/SPECTRUM)
...I tried it with the numpy function genfromtxt() (see below in the code excerpt).
import numpy as np
col_names = ["date", "time", "rain_intensity", "weather_code_1", "radar_ref", "weather_code_2", "val6", "rain_accum", "val8", "val9"]
types = ["object", "object", "float", "uint8", "float", "uint8", "uint8", "float", "uint8","|S10"]
# Read in the file with np.genfromtxt
mydata = np.genfromtxt("test.txt", delimiter=";", names=col_names, dtype=types)
Now when I execute the code I get the following error -->
raise ValueError(errmsg)ValueError: Some errors were detected !
Line #4 (got 79 columns instead of 10)
Now I think that the difficulties come from the last column (val9) with the many ;;;;;;;
It is obvious that the delimeters and the signs in the last column;
are the same!
How can I read in the file without an error, maybe there is a possibility to skip the last column, or to replace the ;
only in the last column?
Upvotes: 0
Views: 2160
Reputation: 231325
usecols
can be used to ignore excess delimiters, e.g.
In [546]: np.genfromtxt([b'1,2,3',b'1,2,3,,,,,,'], dtype=None,
delimiter=',', usecols=np.arange(3))
Out[546]:
array([[1, 2, 3],
[1, 2, 3]])
Upvotes: 0
Reputation: 8412
From the numpy documentation
invalid_raise : bool, optional
If True, an exception is raised if an inconsistency is detected in the number of columns. If False, a warning is emitted and the offending lines are skipped.
mydata = np.genfromtxt("test.txt", delimiter=";", names=col_names, dtype=types, invalid_raise = False)
Note that there were errors in your code which I have corrected (delimiter spelled incorrectly, and types
list referred to as dtypes
in function call)
Edit: From your comment, I see I slightly misunderstood. You meant that you want to skip the last column not the last row.
Take a look at the following code. I have defined a generator that only returns the first ten elements of each row. This will allow genfromtxt()
to complete without error and you now get column #3 from all rows.
Note though, that you are still going to lose some data, as if you look carefully you will see that the problem line is actually two lines concatenated together with garbage where the other lines have ZERO
. So you are still going to lose this second line. You could maybe modify the generator to parse each line and deal with this differently, but I'll leave that as a fun exercise :)
import numpy as np
def filegen(filename):
with open(filename, 'r') as infile:
for line in infile:
yield ';'.join(line.split(';')[:10])
col_names = ["date", "time", "rain_intensity", "weather_code_1", "radar_ref", "weather_code_2", "val6", "rain_accum", "val8", "val9"]
dtypes = ["object", "object", "float", "uint8", "float", "uint8", "uint8", "float", "uint8","|S10"]
# Read in the file with np.genfromtxt
mydata = np.genfromtxt(filegen('temp.txt'), delimiter=";", names=col_names, dtype = dtypes)
Output
[('01.06.2015', '00:00:00', 0.0, 0, -9.999, 0, 8, 0.0, 7, '(SPECTRUM)')
('01.06.2015', '00:01:00', 0.0, 0, -9.999, 0, 8, 0.0, 10, '(SPECTRUM)')
('01.06.2015', '00:02:00', 0.0, 0, -9.999, 0, 8, 0.0, 16, '(SPECTRUM)')
('01.06.2015', '09:23:00', 0.327, 61, 25.831, 39, 29, 0.18, 62, '01.06.2015')
('01.06.2015', '09:24:00', 0.0, 0, -9.999, 0, 29, 0.0, 66, '(SPECTRUM)')]
Upvotes: 2