Reputation: 115
i have some problems with string quotes in a windows batch file. I try to parse a csv file (, seperated) and multiline entries. I have to write out only the strings after the fouth sperator as a text file. The result has to include the new lines and the quotes.
Input:
"1","50","1","Warning! Q1: Value too high [W-0001]"
"2","49","1","Warning! Q1: Value too low [W-0002]"
"3","48","1","Warning! Q1: Value changing too fast.
Check for endless loop[W-0003]"
Output:
"Warning! Q1: Value too high [W-0001]"
"Warning! Q1: Value too low [W-0002]"
"Warning! Q1: Value changing too fast.
Check for endless loop[W-0003]"
This work fine for the strings without new lines.
for /F "tokens=1-4 skip=4 delims=," %%a in (input.csv) do @echo %%d >> output.txt
I tried this but i got a syntax error
for /F "delims=, tokens=1-4" %%a in (input.csv) do call :loopbody %%a %%b %%c %%d
goto :eof
if [%4]==[] goto :test
echo %4
goto :eof
:test
echo %1
Error because of the quote at the beginning of the string
Syntaxfehler.
C:\>if ["Warning Value changing too fast.]==[] goto :test
How to handle variable values with quotes?
Upvotes: 1
Views: 1268
Reputation: 130819
Regular expressions are wonderful things :) You can use a hybrid JScript/batch utility called REPL.BAT that performs a regex search and replace on stdin and writes the result to stdout.
Assuming REPL.BAT is in your current directory, or better yet, somewhere whithin you path, then the following one liner works directly from the command line:
repl "^(?:[^,]*,){3}(\q[^\q]*\q).*$" "$1\n" mx <input.csv >output.txt
Please note, the above solution assumes every record contains at least 4 columns (3 commas). It also assumes the first 3 columns do not contain any commas in the value, the 4th column must be quoted and must not contain escaped quote ""
, and any column after the 4th does not contain a new line in the value. All hell may break loose if these conditions are not met.
Using a much more complicated regex, I have a solution that should extract the 4th column of any valid csv file. It properly handles both quoted and unquoted values. Quoted values may contain commas, escaped quotes, and/or new lines. The 4th column output will always be quoted, even if the source is not. The only limitation is every record must contain at least 4 columns.
repl "^(?:(?:[^,\q\n]*|\q(?:[^\q]|\q\q)*\q),){3}(?:\q((?:[^\q]|\q\q)*)\q|([^,\q\n]*))(?:,(?:[^,\q\n]*|\q(?:[^\q]|\q\q)*\q))*$" "\q$1$2\q" mx <input.csv >output.txt
Here is a nasty test case csv file:
1,"50,
""ignore""
x","1","Warning! Q1: Value too high [W-0001]",extra," hello,""world""
more!","part A
1,2,3","I don't want this",<last column of record 1>
"2","49","1","Warning! Q1: Value too low [W-0002]",<last column of record 2>
"3","48","1","Warning! Q1: Value changing too fast.
Check for endless loop[W-0003]","<last column of record 3>"
1,2,3,"I want this part 1
and ""this"" part 2",<last column of record 4>
a,b,c,I want this unquoted value,<last column of record 5>
And here is the output:
"Warning! Q1: Value too high [W-0001]"
"Warning! Q1: Value too low [W-0002]"
"Warning! Q1: Value changing too fast.
Check for endless loop[W-0003]"
"I want this part 1
and ""this"" part 2"
"I want this unquoted value"
Use of batch variables helps to document the logic of the regex, and makes it easy to quickly capture any arbitrary set of columns. Here is the same solution using variables:
@echo off
setlocal
:: define a regex that matches a non-captured column value
set "ignore=(?:[^,\q\n]*|\q(?:[^\q]|\q\q)*\q)"
:: define a regex that matches a captured column value
:: quoted value is in $1, unquoted value is in $2
set "capture=(?:\q((?:[^\q]|\q\q)*)\q|([^,\q\n]*))"
call repl "^(?:%ignore%,){3}%capture%(?:,%ignore%)*$" "\q$1$2\q" mx <input.csv >output.txt
type output.txt
And here is a demonstration of a final CALL that captures the 1st and 4th columns
call repl "^%capture%,%ignore%,%ignore%,%capture%(?:,%ignore%)*$" "\q$1$2\q,\q$3$4\q" mx <input.csv >output.csv
Upvotes: 1
Reputation: 41224
This works with your data:
@echo off
for /F "delims=, tokens=1-3,*" %%a in (input.csv) do (
set "aa="
set "aa=%%d"
if not defined aa (
>>file.out echo %%a
) else (
>>file.out echo %%d
)
)
Upvotes: 3