Tika9o9
Tika9o9

Reputation: 425

Extract Part of Sentence from Text File

I have an awkward text file (hosts.txt) I need to extract a certain part of a sentence from:-

18 Jul 2019 09:30 BST
62.172.169.12
United Kingdom 
H82640A745.XGPH82640
3.12.21.0
Remove
18 Jul 2019 09:29 BST
62.172.169.9
United Kingdom 
H82640A744.XGPH82640
3.12.21.0
Remove
18 Jul 2019 09:26 BST
62.172.169.18
United Kingdom 
H82640A740.XGPH82640
3.12.21.0
Remove

I just need the H********* number next to .XGPH82640 - so from the example I just need a list like:-

H82640A745
H82640A744
H82640A740

and so on...

I am trying to extract using tokens and delims in batch but I'm not getting any where. If I try and Skip=X number of lines it doesn't work because the first H******* number has three lines above, but from then on has 5.

I have read the SS64 on tokens and delims as I would really like to be able to figure this out myself but I'm not getting it. Especially with this text file.

At the minute I am trying to use the ":" as the delimiter but again the token numbers alter, so if it was just the first five lines

For /F "Tokens=4 delims=:" %%A In (hosts.txt) Do echo %%A

Any help would be great - thanks!

Upvotes: 0

Views: 1028

Answers (2)

Compo
Compo

Reputation: 38654

This answer is based upon my comment and your subsequent suggestion that the lines may contain an unknown single period separated alphanumeric string instead of known one:

From a :

@Echo Off
If Not Exist "hosts.txt" GoTo :EOF
For /F "Delims=" %%A In (
    '""%__AppDir__%findstr.exe" /X "^[A-Z0-9]*\.[A-Z0-9]*$" "hosts.txt""'
) Do Echo %%~nA
Pause

Directly in :

For /F "Delims=" %A In ('""%__AppDir__%findstr.exe" /X "^[A-Z0-9]*\.[A-Z0-9]*$" "hosts.txt" 2>NUL"')Do @Echo %~nA

Upvotes: 1

Mofi
Mofi

Reputation: 49127

You could use following command line in your batch file:

for /F "tokens=1,2 delims=." %%I in (hosts.txt) do if "%%J" == "XGPH82640" echo %%I

FOR reads the file hosts.txt line by line with ignoring empty lines.

The string delimiter is modified with delims=. from default normal space or horizontal tab to character ..

Of interest for this task are lines which have two dot delimited substrings whereby the second substring should be XGPH82640. For that reason tokens=1,2 is used to get first dot delimited string assigned to loop variable I and second dot delimited string assigned to next loop variable which is J according to ASCII table.

If the first substring after removing all leading . would start with a semicolon, command FOR would also ignore the line because of eol=; is the default for end of line character. But it can be assumed that no line with XGPH82640 starts with ; and therefore the default end of line character can be kept as is.

The case-sensitive IF condition verifies if the second dot delimited string is really XGPH82640 and not an empty string as on the lines with date/time or with country and or a decimal number as on the lines with an IPv4 address.

On a true IF condition the first dot delimited string is output to console.

Upvotes: 2

Related Questions