Reputation: 11

Naming csplit files with a specific string inside the file

So, I have a file called "test.log" with multiple entries like this:

2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T01-TC02'            Test 'tst_T01-TC02' started (tst_T01-TC02)
2022-09-30T11:38:01 PASS shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal 
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T01-TC02'              End of test 'tst_T01-TC02'<br>
2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T01-TC01'            Test 'tst_T01-TC01' started (tst_T01-TC01)
2022-09-30T11:38:01 PASS shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal 
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T01-TC01'              End of test 'tst_T01-TC01'<br>
2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T02-TC01'            Test 'tst_T02-TC01' started (tst_T02-TC01)
2022-09-30T11:38:01 FAIL shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal 
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T02-TC01'              End of test 'tst_T02-TC01'

What I want is to create X files, each file must contain a single TestCase. I archive this by using the following command:

sed '/START_TEST_CASE/,/END_TEST_CASE/!d' $LOG_FILE_NAME | \
  csplit -z --suffix-format="%d.log" - '/END_TEST_CASE/1' '{*}'

Now, the files that I create using this method are called xx5.log or xx0.log or similar wording.

What I want is to modify this script in order to call, each created file, by its relevant test case name (this name is written inside the text, in the same row of START_TEST_CASE)

For example, the first file created, containing the first TestCase by line, must be named tst_T01-TC02.log, the second tst_T01-TC01.log, the third tst_T02-TC01.log, ecc.

How can I achieve this?

Upvotes: 1

Answers (3)

potong

Reputation: 58488

This might work for you (GNU sed):

sed '/START/,/END/!d' file | csplit -qz - '/END/1' '{*}' &&
sed -Esn '1F;1s/.*Start (\S+).*/\1/p' xx* |
sed 'N;s/\n/ /;s/^/mv /;s/.$/.log&/' |
sh

After the csplit command is successful, build and execute a script which moves each csplit file result to its own file name taken from within that split file.

The solution is in three parts:

Parse each csplit file and create a file of two line records. Where the first line of the record is the original csplit file name and the second line is the file name to be.
Take the output from above file and condense the two line record structure to a record per single line. Prepend the mv command to each line and append .log to the new file name.
Once again, pipe the output from above into a shell and execute the move commands.

N.B. The last step can be removed and result checked before the move commands are executed.

Upvotes: 0

markp-fuso

Reputation: 35106

I'm not aware of an 'easy' way to do this with csplit but if awk is an option ...

Adding a few more lines to sample input:

$ cat test.log
ignore this line
2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T01-TC02'            Test 'tst_T01-TC02' started (tst_T01-TC02)
2022-09-30T11:38:01 PASS shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T01-TC02'              End of test 'tst_T01-TC02'<br>
ignore this line
2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T01-TC01'            Test 'tst_T01-TC01' started (tst_T01-TC01)
2022-09-30T11:38:01 PASS shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T01-TC01'              End of test 'tst_T01-TC01'<br>
ignore this line
2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T02-TC01'            Test 'tst_T02-TC01' started (tst_T02-TC01)
2022-09-30T11:38:01 FAIL shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T02-TC01'              End of test 'tst_T02-TC01'
ignore this line

One awk idea (replaces all of OP's current code - sed|csplit):

awk -v sq="'" '                                # define variable "sq" as a single quote
/START_TEST_CASE/ { close(outfile)             # close previous output file to keep awk from running out of file descriptors
                    split($0,a,sq)             # split line on single quote
                    outfile=a[2] ".log"        # define new output file name
                    printme=1                  # enable print flag
                  }
printme           { print $0 > outfile }       # if print flag enabled (==1) then print current line to "outfile"
/END_TEST_CASE/   { printme=0 }                # disable print flag
' test.log

This generates:

$ head tst*log
==> tst_T01-TC01.log <==
2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T01-TC01'            Test 'tst_T01-TC01' started (tst_T01-TC01)
2022-09-30T11:38:01 PASS shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T01-TC01'              End of test 'tst_T01-TC01'<br>

==> tst_T01-TC02.log <==
2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T01-TC02'            Test 'tst_T01-TC02' started (tst_T01-TC02)
2022-09-30T11:38:01 PASS shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T01-TC02'              End of test 'tst_T01-TC02'<br>

==> tst_T02-TC01.log <==
2022-09-30T11:37:54 START_TEST_CASE Start 'tst_T02-TC01'            Test 'tst_T02-TC01' started (tst_T02-TC01)
2022-09-30T11:38:01 FAIL shared/scripts/Project/LoginWindow.py:39: Comparison   'True' and 'True' are equal
2022-09-30T11:38:16 END_TEST_CASE   End 'tst_T02-TC01'              End of test 'tst_T02-TC01'

Upvotes: 0

Ionuț G. Stan

Reputation: 179179

I'd use AWK for this:

awk '
  /START_TEST_CASE/ {
    match($0, /tst_[^'"'"']+/)
    test_name = substr($0, RSTART, RLENGTH)
  }

  /START_TEST_CASE/ , /END_TEST_CASE/ {
    print $0 > (test_name ".log")
  }
' "$LOG_FILE_NAME"

The weird [^'"'"'] part is actually just [^'], but we have to escape it for use within a Bash single-quoted string.

Or, if you don't mind a standalone AWK script, you could use this:

split.awk

/START_TEST_CASE/ {
  match($0, /tst_[^']+/)
  test_name = substr($0, RSTART, RLENGTH)
}

/START_TEST_CASE/ , /END_TEST_CASE/ {
  print $0 > (test_name ".log")
}

And then:

awk -f split.awk "$LOG_FILE_NAME"

Upvotes: 0

Naming csplit files with a specific string inside the file

Answers (3)

split.awk

Related Questions