Reputation: 21
I would like to import the following .csv data (.txt file) into python lists for each column of data, ignoring the text at the start. I can't change the format of the file. I'm getting the error:
"Traceback (most recent call last):
File "/Users/Hamish/Desktop/Python/AWBM/Import.py", line 13, in <module>
rain_column = float(row[7])
IndexError: list index out of range"
This is the code which I'm trying to get working...
import csv
import numpy as np
file = open('Data_Bris.txt')
reader = csv.reader(file, delimiter=' ')
datelist = []
rainlist = []
evaplist = []
for row in reader:
# row = [date, day, date2, T.Max, Smx, T.Min, Smn, Rain, Srn, Evap, Sev, Rad, Ssl, VP, Svp, maxT, minT, Span, Ssp]
date_column = str(row[0])
rain_column = float(row[7])
evap_column = float(row[9])
datelist.append([date_column])
rainlist.append([rain_column])
evaplist.append([evap_column])
date = np.array([datelist])
rain = np.array([rainlist])
evap = np.array([evaplist])
timeseries = np.arange(rain.size)
This is the data file that I would like to import (continues the same beyond)...
"17701231" 365 31/12/1770 -99.9 999 -99.9 999 9999.9 999 999.9 999 999.9 999 999.9 999 9999.9 9999.9 9999.9 999
""
" This file is SPACE DELIMITED for easy import into both spreadsheets and programs."
"The first line 17701231 contains dummy data and is provided to allow spreadsheets to sense the columns"
" To read into a spreadsheet select DELIMITED and SPACE."
" "
" "
"========= The following essential information and notes should be kept in the data file =========="
" "
"The Data Drill system and data are copyright to the Queensland Government Department of Science, Information Technology and Innovation (DSITI)."
"SILO data, with the exception of Patched Point data for Queensland, are supplied to the licencee only and may not be given, lent, or sold to any other party"
" "
"Notes:"
" * Data Drill for Lat, Long: -27.5000 153.0000 (DECIMAL DEGREES), 27 30'S 153 00'E Your Ref: Data_Bris"
" * Elevation: 102m "
" * Extracted from Silo on 20171214"
" * Please read the documentation on the Data Drill at http://www.longpaddock.qld.gov.au/silo"
" "
" * As evaporation is read at 9am, it has been shifted to the day before"
" ie The evaporation measured on 20 April is in row for 19 April"
" * The 6 Source columns Smx, Smn, Srn, Sev, Ssl, Svp indicate the source of the data to their left, namely Max temp, Min temp, Rainfall, Evaporation, Radiation and Vapour Pressure respectively "
" "
" 35 = interpolated from daily observations using anomaly interpolation method for CLIMARC data
" 25 = interpolated daily observations, 75 = interpolated long term average"
" 26 = synthetic pan evaporation "
" "
" * Relative Humidity has been calculated using 9am VP, T.Max and T.Min"
" RHmaxT is estimated Relative Humidity at Temperature T.Max"
" RHminT is estimated Relative Humidity at Temperature T.Min"
" Span = a calibrated estimate of class A pan evaporation based on vapour pressure deficit and solar radiation
" * The accuracy of the data depends on many factors including date, location, and variable."
" For consistency data is supplied using one decimal place, however it is not accurate to that precision."
" Further information is available from http://www.longpaddock.qld.gov.au/silo"
"===================================================================================================="
" "
Date Day Date2 T.Max Smx T.Min Smn Rain Srn Evap Sev Radn Ssl VP Svp RHmaxT RHminT Span Ssp
(yyyymmdd) () (ddmmyyyy) (oC) () (oC) () (mm) () (mm) () (MJ/m2) () (hPa) () (%) (%) (mm) ()
18890101 1 1-01-1889 29.5 35 21.5 35 0.3 25 6.2 75 23.0 35 26.0 35 63.1 100.0 5.6 26
18890102 2 2-01-1889 32.0 35 21.5 35 0.1 25 6.2 75 23.0 35 21.0 35 44.2 81.9 6.9 26
18890103 3 3-01-1889 31.5 35 21.5 35 0.0 25 6.2 75 23.0 35 24.0 35 51.9 93.6 6.4 26
18890104 4 4-01-1889 29.5 35 21.0 35 0.0 25 6.2 75 23.0 35 22.0 35 53.4 88.5 6.1 26
18890105 5 5-01-1889 30.0 35 19.0 35 0.0 25 6.2 75 23.0 35 19.0 35 44.8 86.5 6.5 26
18890106 6 6-01-1889 28.5 35 18.5 35 0.0 25 6.2 75 23.0 35 23.0 35 59.1 100.0 5.7 26
18890107 7 7-01-1889 30.0 35 18.5 35 0.1 25 6.2 75 23.0 35 20.0 35 47.1 94.0 6.4 26
18890108 8 8-01-1889 28.0 35 18.5 35 0.0 25 6.2 75 23.0 35 21.0 35 55.6 98.7 5.8 26
18890109 9 9-01-1889 28.5 35 19.0 35 0.0 25 6.2 75 24.0 35 22.0 35 56.5 100.0 6.0 26
18890110 10 10-01-1889 29.0 35 20.0 35 0.0 25 6.2 75 23.0 35 21.0 35 52.4 89.9 6.1 26
Upvotes: 2
Views: 764
Reputation: 149125
Here, you want to ignore all lines from the header including the names and format of the columns. A simple way to achieve that is to ignore any line not starting with a digit. With a generator (to avoid loading all the file in memory), you could simply create your reader
with:
...
reader = csv.reader((row for row in io.StringIO(t) if row[0].isdigit()),
delimiter=' ', skipinitialspace=True))
...
The skipinitialspace=True
allows to accept multiple spaces as a single delimiter.
Upvotes: 2