bikuser
bikuser

Reputation: 2093

How to read data from .txt file with whitespace delimiter?

Hello I have a daily data in a file and the daily data are arranged in monthly column-wise. with some file information in the beginning. And the data looks like:

Day     Jan.   Feb.   Mar.   Apr.   May    Jun.   Jul.   Aug.   Sep.   Oct.   Nov.   Dec.   Year
 01     15.2   12.3   9.96   10.1   15.0   33.7    137    309    182   62.6   27.4   17.2
 02     14.9   12.3   9.96   9.96   16.4   38.2    109    342    197   69.9   25.4   16.6
 03     14.9   12.3   9.78   10.3   17.3   50.3    118    472    184   68.7   24.5   17.0
 04     14.6   12.3   9.69   10.3   18.7   58.1    152    275    190   68.7   24.5   16.6
 05     14.4   12.3   9.51   10.1   18.9   44.5    165    188    206   69.9   24.0   16.5
 06     14.1   12.3   9.41   10.3   19.8   44.8    142    157    192   62.2   23.8   16.1
 07     14.0   12.3   9.32   10.3   20.4   52.6    121    146    182   58.9   24.9   15.6

I used this code to read data

data ='AQ404.7_01.txt'
with open(data) as fo:
    data = fo.readlines()[9:41]
    df = data[1:32]
    df = [d.strip() for d in df]
    df = (np.array(df))
    column = data[0][:-6]
    for string in (df):
        df = string.split()
        print df

But the problem is, when I checked the data for february, It gives 31 data. I tried to solve but not able to do.

Can anyone help to solve this problem? I have also attached datafile here. https://drive.google.com/file/d/0B2rkXkOkG7ExTlJ3VnExUHFZUzA/view?usp=sharing

Upvotes: 1

Views: 4042

Answers (2)

hpaulj
hpaulj

Reputation: 231385

I copy-n-pasted your sample to my ipython session as a multiline text and ran the following genfromtxt:

In [281]: np.genfromtxt(txt.splitlines(),dtype=None,names=True,usecols=range(13))
Out[281]: 
array([(1, 15.2, 12.3, 9.96, 10.1, 15.0, 33.7, 137, 309, 182, 62.6, 27.4, 17.2),
       (2, 14.9, 12.3, 9.96, 9.96, 16.4, 38.2, 109, 342, 197, 69.9, 25.4, 16.6),
       (3, 14.9, 12.3, 9.78, 10.3, 17.3, 50.3, 118, 472, 184, 68.7, 24.5, 17.0),
       (4, 14.6, 12.3, 9.69, 10.3, 18.7, 58.1, 152, 275, 190, 68.7, 24.5, 16.6),
       (5, 14.4, 12.3, 9.51, 10.1, 18.9, 44.5, 165, 188, 206, 69.9, 24.0, 16.5),
       (6, 14.1, 12.3, 9.41, 10.3, 19.8, 44.8, 142, 157, 192, 62.2, 23.8, 16.1),
       (7, 14.0, 12.3, 9.32, 10.3, 20.4, 52.6, 121, 146, 182, 58.9, 24.9, 15.6)], 
      dtype=[('Day', '<i4'), ('Jan', '<f8'), ('Feb', '<f8'), ('Mar', '<f8'), ('Apr', '<f8'), ('May', '<f8'), ('Jun', '<f8'), ('Jul', '<i4'), ('Aug', '<i4'), ('Sep', '<i4'), ('Oct', '<f8'), ('Nov', '<f8'), ('Dec', '<f8')])

I had to specify usecols because you have 14 names in the header, but only 13 fields in the data lines.

Note that it loaded a 1d structured array. Columns are accessed by field name, e.g. data['Jan'], not by number. data[1] will the data from the 2nd dataline.

If I skip the header line, I can load it as a 2d array of floats

In [284]: np.genfromtxt(txt.splitlines(),dtype=float,skip_header=1)
Out[284]: 
array([[   1.  ,   15.2 ,   12.3 ,    9.96,   10.1 ,   15.  ,   33.7 ,
         137.  ,  309.  ,  182.  ,   62.6 ,   27.4 ,   17.2 ],
       [   2.  ,   14.9 ,   12.3 ,    9.96,    9.96,   16.4 ,   38.2 ,
         109.  ,  342.  ,  197.  ,   69.9 ,   25.4 ,   16.6 ],
       [   3.  ,   14.9 ,   12.3 ,    9.78,   10.3 ,   17.3 ,   50.3 ,
         118.  ,  472.  ,  184.  ,   68.7 ,   24.5 ,   17.  ],
       [   4.  ,   14.6 ,   12.3 ,    9.69,   10.3 ,   18.7 ,   58.1 ,
         152.  ,  275.  ,  190.  ,   68.7 ,   24.5 ,   16.6 ],
       [   5.  ,   14.4 ,   12.3 ,    9.51,   10.1 ,   18.9 ,   44.5 ,
         165.  ,  188.  ,  206.  ,   69.9 ,   24.  ,   16.5 ],
       [   6.  ,   14.1 ,   12.3 ,    9.41,   10.3 ,   19.8 ,   44.8 ,
         142.  ,  157.  ,  192.  ,   62.2 ,   23.8 ,   16.1 ],
       [   7.  ,   14.  ,   12.3 ,    9.32,   10.3 ,   20.4 ,   52.6 ,
         121.  ,  146.  ,  182.  ,   58.9 ,   24.9 ,   15.6 ]])

Working from scatch, I can turn the lines into a list of lists with:

ll = []
for line in txt.splitlines():
    ll.append(line.strip().split())

I can get a list of lists of floats with:

for line in txt.splitlines()[1:]:   # skip the header
    ll.append([float(i) for i in line.strip().split()])

which can be turned into 2d array with

np.array(ll)

If white space delimiter doesn't work, genfromtxt also accepts a list of field widths as the 'delimiter'. See its docs or experiment.

Upvotes: 1

Abbas
Abbas

Reputation: 4070

You shall use Fixed Width File reader of pandas:

So for the input file you have define the list of fixed widths:

#Define the column widths    
ws = [4,9,7,7,7,7,7,7,7,7,7,7,7]

#read the file having the header row in the 9th line and read only 31 lines after that
df = pd.read_fwf('AQ404.7_01.txt',widths=ws,header=9, nrows=31)

print df

enter image description here

Upvotes: 2

Related Questions