Reputation: 2093
Hello I have a daily data in a file and the daily data are arranged in monthly column-wise. with some file information in the beginning. And the data looks like:
Day Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Year
01 15.2 12.3 9.96 10.1 15.0 33.7 137 309 182 62.6 27.4 17.2
02 14.9 12.3 9.96 9.96 16.4 38.2 109 342 197 69.9 25.4 16.6
03 14.9 12.3 9.78 10.3 17.3 50.3 118 472 184 68.7 24.5 17.0
04 14.6 12.3 9.69 10.3 18.7 58.1 152 275 190 68.7 24.5 16.6
05 14.4 12.3 9.51 10.1 18.9 44.5 165 188 206 69.9 24.0 16.5
06 14.1 12.3 9.41 10.3 19.8 44.8 142 157 192 62.2 23.8 16.1
07 14.0 12.3 9.32 10.3 20.4 52.6 121 146 182 58.9 24.9 15.6
I used this code to read data
data ='AQ404.7_01.txt'
with open(data) as fo:
data = fo.readlines()[9:41]
df = data[1:32]
df = [d.strip() for d in df]
df = (np.array(df))
column = data[0][:-6]
for string in (df):
df = string.split()
print df
But the problem is, when I checked the data for february, It gives 31 data. I tried to solve but not able to do.
Can anyone help to solve this problem? I have also attached datafile here. https://drive.google.com/file/d/0B2rkXkOkG7ExTlJ3VnExUHFZUzA/view?usp=sharing
Upvotes: 1
Views: 4042
Reputation: 231385
I copy-n-pasted your sample to my ipython
session as a multiline text and ran the following genfromtxt
:
In [281]: np.genfromtxt(txt.splitlines(),dtype=None,names=True,usecols=range(13))
Out[281]:
array([(1, 15.2, 12.3, 9.96, 10.1, 15.0, 33.7, 137, 309, 182, 62.6, 27.4, 17.2),
(2, 14.9, 12.3, 9.96, 9.96, 16.4, 38.2, 109, 342, 197, 69.9, 25.4, 16.6),
(3, 14.9, 12.3, 9.78, 10.3, 17.3, 50.3, 118, 472, 184, 68.7, 24.5, 17.0),
(4, 14.6, 12.3, 9.69, 10.3, 18.7, 58.1, 152, 275, 190, 68.7, 24.5, 16.6),
(5, 14.4, 12.3, 9.51, 10.1, 18.9, 44.5, 165, 188, 206, 69.9, 24.0, 16.5),
(6, 14.1, 12.3, 9.41, 10.3, 19.8, 44.8, 142, 157, 192, 62.2, 23.8, 16.1),
(7, 14.0, 12.3, 9.32, 10.3, 20.4, 52.6, 121, 146, 182, 58.9, 24.9, 15.6)],
dtype=[('Day', '<i4'), ('Jan', '<f8'), ('Feb', '<f8'), ('Mar', '<f8'), ('Apr', '<f8'), ('May', '<f8'), ('Jun', '<f8'), ('Jul', '<i4'), ('Aug', '<i4'), ('Sep', '<i4'), ('Oct', '<f8'), ('Nov', '<f8'), ('Dec', '<f8')])
I had to specify usecols
because you have 14 names in the header, but only 13 fields in the data lines.
Note that it loaded a 1d structured array. Columns are accessed by field name, e.g. data['Jan']
, not by number. data[1]
will the data from the 2nd dataline.
If I skip the header line, I can load it as a 2d array of floats
In [284]: np.genfromtxt(txt.splitlines(),dtype=float,skip_header=1)
Out[284]:
array([[ 1. , 15.2 , 12.3 , 9.96, 10.1 , 15. , 33.7 ,
137. , 309. , 182. , 62.6 , 27.4 , 17.2 ],
[ 2. , 14.9 , 12.3 , 9.96, 9.96, 16.4 , 38.2 ,
109. , 342. , 197. , 69.9 , 25.4 , 16.6 ],
[ 3. , 14.9 , 12.3 , 9.78, 10.3 , 17.3 , 50.3 ,
118. , 472. , 184. , 68.7 , 24.5 , 17. ],
[ 4. , 14.6 , 12.3 , 9.69, 10.3 , 18.7 , 58.1 ,
152. , 275. , 190. , 68.7 , 24.5 , 16.6 ],
[ 5. , 14.4 , 12.3 , 9.51, 10.1 , 18.9 , 44.5 ,
165. , 188. , 206. , 69.9 , 24. , 16.5 ],
[ 6. , 14.1 , 12.3 , 9.41, 10.3 , 19.8 , 44.8 ,
142. , 157. , 192. , 62.2 , 23.8 , 16.1 ],
[ 7. , 14. , 12.3 , 9.32, 10.3 , 20.4 , 52.6 ,
121. , 146. , 182. , 58.9 , 24.9 , 15.6 ]])
Working from scatch, I can turn the lines into a list of lists with:
ll = []
for line in txt.splitlines():
ll.append(line.strip().split())
I can get a list of lists of floats with:
for line in txt.splitlines()[1:]: # skip the header
ll.append([float(i) for i in line.strip().split()])
which can be turned into 2d array with
np.array(ll)
If white space delimiter doesn't work, genfromtxt
also accepts a list of field widths as the 'delimiter'. See its docs or experiment.
Upvotes: 1
Reputation: 4070
You shall use Fixed Width File reader of pandas:
So for the input file you have define the list of fixed widths:
#Define the column widths
ws = [4,9,7,7,7,7,7,7,7,7,7,7,7]
#read the file having the header row in the 9th line and read only 31 lines after that
df = pd.read_fwf('AQ404.7_01.txt',widths=ws,header=9, nrows=31)
print df
Upvotes: 2