Reputation: 1550
I need help using text scan. I am trying to read data that has the following format:
# ---------------------------------- WARNING ----------------------------------------
# The data you have obtained from this automated U.S. Geological Survey database
# have not received Director's approval and as such are provisional and subject to
# revision. The data are released on the condition that neither the USGS nor the
# United States Government may be held liable for any damages resulting from its use.
# Additional info: http://nwis.waterdata.usgs.gov/nwis/help/?provisional
#
# File-format description: http://nwis.waterdata.usgs.gov/nwis/?tab_delimited_format_info
# Automated-retrieval info: http://nwis.waterdata.usgs.gov/nwis/?automated_retrieval_info
#
# Contact: [email protected]
# retrieved: 2013-09-13 13:10:29 EDT (nadww01)
#
# Data for the following 1 site(s) are contained in this file
# USGS 08067074 CWA Canal at Thompson Rd nr Baytown, TX
# -----------------------------------------------------------------------------------
#
# Data provided for site 08067074
# DD parameter Description
# 01 00010 Temperature, water, degrees Celsius
# 02 00095 Specific conductance, water, unfiltered, microsiemens per centimeter at 25 degrees Celsius
#
# Data-value qualification codes included in this output:
# A Approved for publication -- Processing and review completed.
# P Provisional data subject to revision.
#
agency_cd site_no datetime tz_cd 01_00010 01_00010_cd 02_00095 02_00095_cd
5s 15s 20d 6s 14n 10s 14n 10s
USGS 08067074 2013-01-05 00:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:45 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:45 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:15 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:15 CST 10.2 A 392 A
USGS 08067074 2013-01-05 04:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:15 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 06:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 06:15 CST 10.1 A 391 A
USGS 08067074 2013-01-05 06:30 CST 10.1 A 391 A
USGS 08067074 2013-01-05 06:45 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:00 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:15 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:30 CST 10.1 A 390 A
USGS 08067074 2013-01-05 07:45 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:00 CST 10.0 A 390 A
USGS 08067074 2013-01-05 08:15 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:30 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:45 CST 10.0 A 390 A
USGS 08067074 2013-01-05 09:00 CST 10.0 A 390 A
USGS 08067074 2013-01-05 09:15 CST 10 A 390 A
USGS 08067074 2013-01-05 09:30 CST 10 A 390 A
USGS 08067074 2013-01-05 09:45 CST 10 A 390 A
USGS 08067074 2013-01-05 10:00 CST 10 A 390 A
USGS 08067074 2013-01-05 10:15 CST 10 A 390 A
USGS 08067074 2013-01-05 10:30 CST 10 A 390 A
USGS 08067074 2013-01-05 10:45 CST 10 A 390 A
USGS 08067074 2013-01-05 11:00 CST 10 A 390 A
USGS 08067074 2013-01-05 11:15 CST 10 A 390 A
USGS 08067074 2013-01-05 11:30 CST 10 A 390 A
USGS 08067074 2013-01-05 11:45 CST 10 A 389 A
USGS 08067074 2013-01-05 12:00 CST 10 A 389 A
USGS 08067074 2013-01-05 12:15 CST 10 A 389 A
USGS 08067074 2013-01-05 12:30 CST 10 A 389 A
USGS 08067074 2013-01-05 12:45 CST 10 A 389 A
USGS 08067074 2013-01-05 13:00 CST 10 A 389 A
USGS 08067074 2013-01-05 13:15 CST 10 A 389 A
USGS 08067074 2013-01-05 13:30 CST 10 A 389 A
The only two data entries I am concerned with are "Specific conductance", and "date". (columns 3 and 7 respectively)
I was able to do this on a consistant basis using the following code:
%%
% Collect conductance data
filename = 'conductivityData_Temp_File';
%%
% Determine length of data file
fid = fopen('conductivityData_Temp_File','r');
fseek(fid, 0, 'eof');
chunksize = ftell(fid);
fseek(fid, 0, 'bof');
ch = fread(fid, chunksize, '*uchar');
N = sum(ch == sprintf('\n')); % number of lines
fclose(fid)
%%
% Read conductivity data
fileconductID = fopen(filename);
waterConductivityData = textscan(fileconductID, '%s %d %s %s %f %s %f %s', N, 'delimiter', '\t', 'EmptyValue', 0, 'headerlines', 27);
fclose(fileconductID);
However, I found out that you can simply use 'commentstyle' to ignore the comments. This is important because I am reading multiple files and occasionally I will encounter a file that does not have exactly 27 comment rows. That will make my program throw an error.
Can someone tell me how I can adjust my textscan code to ignore the comment rows and skip the two header rows?
I apologize if the example code I supplied is complicated but basically my error resides in this one line of code:
waterConductivityData = textscan(fileconductID, '%s %d %s %s %f %s %f %s', N, 'delimiter', '\t', 'EmptyValue', 0, 'headerlines', 27);
(if you want to downlad an example tab deliminated file to work with use this link: here
Thank you!
ANSWER:
Thank you TryHard, that was a good approach but i wanted to stay closer to what i was doing previously. Apparently my Delimiters were off.
waterConductivityData = textscan(fileconductID,'%s %s %s %s %s %s %s %s %s ' , 'Delimiter', '\t', 'CommentStyle', '#');
dates = waterConductivityData{3}(3:end);
conductancesStr = waterConductivityData{7}(3:end);
temperaturesStr = waterConductivityData{5}(3:end);
conductances = str2double(conductancesStr);
temperatures = str2double(temperaturesStr);
Upvotes: 0
Views: 13301
Reputation: 5073
One way to circumvent variable header length is to parse the file as follows:
fid=fopen(file);
str=textscan(fid,'%s')
fclose(fid)
str2=strvcat(str{1});
fst=strmatch('CST',str2);
dtstr = str2(fst(1)-2:9:end,:); % date strings
timstr = str2(fst(1)-1:9:end,:); % time strings
condctv = str2(fst(1)+3:9:end,:); % conductivity string
This finds the first occurrence of string "CST" and parses assuming the data lines to be organized similarly in all files, and it requires "CST" to occur in the first data row of the table. If that is not a constant in your data files then the idea is a bust. However you might be able to use other strings to chart your way through the data table, assuming they are unique and always show up in the same place. The following makes use of the last format specifier in the format string:
str2=strvcat(str{1});
fst=strmatch('10s',str2);
fst=fst(end);
dtstr = str2(fst+3:9:end,:);
timstr = str2(fst+4:9:end,:);
condctv = str2(fst+8:9:end,:);
You can convert the strings in condctv
into numeric data with str2num
, as follows:
condctv = str2num(conductv);
Upvotes: 0
Reputation: 421
Change your textscan line to this:
waterConductivityData = textscan(fileconductID, '%s %d %s %s %f %s %f %s', N, 'Delimiter', '\t', 'EmptyValue', 0, 'CommentStyle', '#');
And then get the columns you wanted like this:
dates = waterConductivityData{3}(3:end)
conductances = waterConductivityData{7}(3:end)
Upvotes: 1