Redacted
Redacted

Reputation: 633

disabling eol, setting delims to linefeed or newline to enable tokenization by line numbers

Foreword

  1. I'm using batch to do this
  2. despite the many 'related' questions on this topic, this one is different/not a duplicate because I want to tokenize lines rather than just 'read line by line'

What I'm doing

I want an easy way to specify what lines I want to parse and I thought the best way to do that would be to use the tokens option of the FOR loop. The problem is that tokens by default are split up by the eol setting (defaults to newline) and by the delims setting (defaults to space). This is great for most use cases but I want to tokenize each line. This will enable me to do alot of what I want to do easily and cleanly.

How I've tried it

Anyways, I've figured out that you can disable the eol character by doing eol^= in the options of FOR. The problem Is that I can't find the actual character I need to specify into delims= to set the delimiter equal to the newline, linefeed, or anything else that only denotes a new line. I don't want to simply process line by line I want to tokenize each line. This is important because questions like this:

New line as a delimeter of FOR loop

and this:

What delimiter to use in FOR loop for reading lines?

Don't apply. Additionally I found this but the answers again didn't actually suit my needs.

https://www.dostips.com/forum/viewtopic.php?t=6471

The reason they don't apply is because those are asking about reading the file line by line. I'm asking about tokenizing each line. This is different because reading line by line can be accomplished by disabling delims and eol, or by setting delims to the first character it finds/setting it to empty (doing "delims="). This ISN'T what I want because I WANT to have delims enabled and I want to have IT split each line INSTEAD of eol splitting each line.

Why would you want to do that?

some backstory, I was going to use the skip command but the manpage for the forloop says that the skip option only skips up to the line specified and doesn't allow you to say; skip 3 lines, read one line and then skip some more lines, or skip by line numbers. I could just use one forloop to extract one line and just have more forloops or do something complicated with counters and nested for loops but it'd be much easier if I can simply tokenize each line.

So here is what I want

Effectively what I want is:

FOR /F "tokens=1,3 delims=<linefeed go here> eol^=" %%A IN ('command 
that prints out multiple lines') DO (echo %%A)

which would echo out the first line and third line of the command output like so:

<command output line 1>
<command output line 3>

(if ya'll have a good example of a simple command that prints out at least 3 lines I'd be willing to edit this to more directly display what I mean but I think ya'll get the idea).

So my question is basically this:

A: Is it possible to accomplish tokenization of lines like this (i.e. specifying line numbers to read by token numbers) B: If A is true then what is the actual linefeed character I need to put in delims? Everywhere I've searched, people seem to say that its not suppsoed to be done that way, but since they're asking a slightly different question, that doesn't apply here. Can I use the ASCII number for it? Is it possible to set it to linefeed with eol disabled?

I've seen some people use:

set $lf=^
delims^=^%$lf%%$lf%^

on the DOStips forum and I Don't quite understand whats going on there. Are they setting linefeed to a different character? It also looks like theyre trying to both use and disable delims at the same time which makes no sense to me.

Extra: If I'm disabling eol wrong or something else would interfere with my current approach please tell me and if you would kindly point me to a manpage or something I'd gladly inform myself so I don't take up your time.

Why is this important?

Because it makes it much easier to read files and grab only the lines you want from command outputs rather than have to play defense, tokenize lines+spaces you don't even want and only grab the ones you do. Doing it this way allows me to just directly say (I want only these lines) and I don't even need a goto or anything weird to break out of the forloop once I'm done grabbing everything I need.

A perfet example is to consider the below lines of text, say I want to grab only the e and i from this 'file'.

  1. a b c
  2. d e f
  3. g h i
  4. j k l

To do this regularly, I'd need to skip the first line, start tokenizing, grab the second token, grab the 6th token and break out using a goto. I don't want to have to count token by token and I don't want to have to use a goto to break out of the loop when I'm done. I just want to say 'grab the 2nd and third lines and treat them slightly differently'. No gotos, no counting tokens, no mess

Update I tried the dostips suggestion

BTW this is just trying to grab all lines of the portopening settings on my local machine (I'm testing a colleauge's batch script)

echo Portopenings check
set $lf=^
FOR /F "tokens=* delims^=^%$lf%%$lf%^ eol^=" %%A IN ('netsh firewall show 
portopening') DO (echo %%A)

But for some reason it didn't throw any errors, nor did it output anything. I expected it to output some lines containing my portopening settings. Running the command in the forloop without the delims and eol options works fine f.e. this:

FOR /F "tokens=*" %%A IN ('netsh firewall show portopening') DO (echo %%A)

Update 2

Found this monster from How can you echo a newline in batch files?

set NLM=^


set NL=^^^%NLM%%NLM%^%NLM%%NLM%
echo There should be a newline%NL%inserted here.

which actually works as intended (be sure to preserve the spacing for some reason messing that up causes the above to not work and instead print There should be a newline^^^^inserted here). The only problem is I can't seem to actually get it to work inside a FOR loop. I keep trying:

FOR /F "tokens=* delims=%NL% eol^=" %%A IN ('netsh firewall show 
portopening') DO (echo %%A)

with variations but nothing seems to work at all. It just says eol^=" was not expected and If I remove the "" it says syntax incorrect. I know I need the quotes, I'm pretty sure the eol^= syntax is correct so I don't think its directly related to those things. I think something weird is happening with delims causing bad errors that don't reflect the actual problem.

Update 3, The rabbit hole

Note that you need the NL or NLM definitions from above to try to run these (they don't work though) I've tried:

  for /F "tokens^=1,2 delims^= eol^=^^!NLM^^!" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F ^"tokens^=1,2 delims^=!NLM! eol^=^" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F "tokens=1,2 delims^=!NLM! eol=" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F "tokens=1,2 delims^=!NLM!" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F "tokens=1,2 delims= eol=" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F "delims=!NLM! eol=" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F "delims^=!NLM! eol=" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F "delims^=!NLM! eol^=" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F ^"delims^=!NLM! eol^=^" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

  for /F ^"delims^=^!NLM^! eol^=^" %%i in ('netsh firewall show 
  portopening') do (echo %%i)

and a bunch of other ways, I've tried all of the above using %NLM% as well and I've tried using the !NL! as well as %NL% for all of these. I've tried omiting options, recombining options, reordering options, escaping, not escaping, and all the other FUN combinations. Most result in syntax errors, some print the whole output with tokens=* and some print some stuff that just makes no sense (weird untokenized column based output that has splits that make no sense) but it doesn't seem to ever print only certain lines by token. Additioanlly the man page says the eol default is ; and that it is for detemrining which lines are comments rather than ending a line. All I want is to just have the delimiter be a newline and have everything else NOT DO ANYTHING WEIRD. I just want to token by each line of output or have some other easy way to grab only specific lines. the Skip option is practically useless unless I only want to grab one line (they REALLY should've expanded that functionality). I just can't wrap my head around the output: TO ME eol=<whatever> should JUST WORK. I've even tried setting it to Q and @ and - just to try and NOT HAVE IT SPLIT LINES but for some reason the command line hates eol^= and says thats horrible syntax. Even stranger if i use delims and eol but not tokens I can omit "" but if I use tokens it will NEVER work without quotes. Even worse I can't find a definitive source on how the heck to actually escape everything properly to accomplish my needs. All I know is that eol^= is """"SUPPOSED"""" to 'disable' eol. I have NO IDEA how that works, if it works, or anything but after trying the above I think 90% of the answers on this topic for other questions must just be completely wrong. Even stranger, I can use !NL! and %NL% in echo statements and it works fine. Trying to use it for delims just doesn't work. Trying to use raw ^ characters or escaped ^ charactrs doesn't work either. I don't even know if the carat IS the linefeed/newline character, I just want that character to be delims so that EACH TOKEN IS A LINE. Mybe delims and tokens are unrelated but I THOUGHT they were related. I thought tokens were defined by delims because delims is BY DEFAULT a space. Feel free to educate me, I'm going to grab lunch now before I explode.

Upvotes: 2

Views: 868

Answers (1)

Aacini
Aacini

Reputation: 67236

Mmmm... A couple points related to this question.

The important point first: there is no way that a for /F command first read all file lines and store they in a class of "buffer", and then proceed to tokenise the buffer based on LF character; for /F command just does not work this way.

Please, carefully read this phrase written by yourself: "Additioanlly the man page says the eol default is ; and that it is for detemrining which lines are comments rather than ending a line". eol option define the character that cause to ignore lines when it comes at beginning of the line. Period.

Now an alternative:

set "lines=1 3"
FOR /F "tokens=1* delims=:" %%A IN ('command prints lines ^| findstr /N "^"') DO (
   FOR /F "tokens=1*" %%X in ("!lines!") do (
      IF "%%A" EQU "%%X" (
         echo %%B
         set "lines=%%Y"
      )
   )
)

Working code based on your example:

@echo off
setlocal EnableDelayedExpansion

set "lines=2 3"
set "selected="
FOR /F "tokens=1* delims=:" %%A IN ('type test.txt ^| findstr /N "^"') DO (
   FOR /F "tokens=1*" %%X in ("!lines!") do (
      IF "%%A" EQU "%%X" (
         set "selected=!selected! %%B"
         set "lines=%%Y"
      )
   )
)

for /F "tokens=2,6" %%A in ("%selected%") do (
   echo Token 2: "%%A"
   echo Token 6: "%%B"
)

test.txt:

a b c
d e f
g h i
j k l

Output:

Token 2: "e"
Token 6: "i"

Upvotes: 4

Related Questions