Will
Will

Reputation: 2410

Split file into multiple when special char met

I have a main file as following :

/* ------------- AAAAAAAA ------------- */
some
lines 
here
/* ------------- BBBBBBBB ------------- */
more
things
/* ------------- CCCCCCCC ------------- */
there
a 
few
more
lines

My final goal is to create a file that will containt only block that contains a specific string, for example if that string would be lines then I would have an output file like this :

/* ------------- AAAAAAAA ------------- */
some
lines 
here
/* ------------- CCCCCCCC ------------- */
there
a 
few
more 
lines

To reach my objective, I first try to split my main file into subfiles by bock to get something like

Then I plan to check each file and if then contains the searched string then I append them back to my new main file.

I don't know if it's the best approach to be honest, moreover that I got more than 1600 blocks for 30139 lines in my main file so that's a lot to parse.

However if I keep this way to do the job, I still have an issue with my code :

#!/bin/ksh
i=0
while IFS=\| read -r "line"; do
        if [ `echo $line | grep '/* ------' | wc -l` -eq 1 ]; then
                i=$((i+1))
        fi
        echo $line > "file-$i"
done < $1

As each block are separated by /* --------, if I do an echo $line, the output will be my root directory (/etc, /tmp etc) and not the $line itself.

So I'm aware that this is a 2 questions-post but because the second problem can be bypassed using a different way doing the script, it is definitely linked.

EDIT :

The solution has to be in korn shell as I cannot install anything on this machine

Upvotes: 4

Views: 769

Answers (4)

Walter A
Walter A

Reputation: 20002

When you really want to use a while read construction, try avoiding additional files and system calls.

matched=0
all=
while IFS= read -r line; do
  if [[ ${line} =~ "/* ----"* ]]; then
      if [ ${matched} -eq 1 ]; then
         printf "%s\n" "${all}"
      fi
      all=
      matched=0
  fi
  all="${all}${line}
"
  if [[ "${line}" =~ line ]]; then
    matched=1
  fi
done < <(cat mainfile; echo "/* ---- The End --- */" )

Upvotes: 1

Rahul Verma
Rahul Verma

Reputation: 3089

using awk

awk -v RS="/[*]" '/lines/{printf "/*"$0}' file

Output:

/* ------------- AAAAAAAA ------------- */
some
lines
here
/* ------------- CCCCCCCC ------------- */
there
a
few
more
lines

Upvotes: 1

Shakiba Moshiri
Shakiba Moshiri

Reputation: 23784

if you do not mind using Perl then there is a good one-liner that makes your achievement easy.

The only thing you need is add a line like this:

/* ------------- END ------------- */

at the very end of your file. so that become this:

/* ------------- AAAAAAAA ------------- */
some
lines 
here
/* ------------- BBBBBBBB ------------- */
more
things
/* ------------- CCCCCCCC ------------- */
there
a 
few
more
lines
/* ------------- END ------------- */

Now with the help of this pattern :

\/\*.*?(?=\/\*)

you can match each part separately. For example this part:

/* ------------- AAAAAAAA ------------- */
some
lines 
here

Thus, if your store the result in an array at the end you would have an array that contains 3 section. And eventually your can apply for lines in each section. If it was found, then that section would be printed.

one-liner

perl -ne 'BEGIN{$/=undef;}push(@arr,$&) while/\/\*.*?(?=\/\*)/smg;END{for (@arr){print if /lines/g }}' file

and the output would be:

/* ------------- AAAAAAAA ------------- */
some
lines 
here
/* ------------- CCCCCCCC ------------- */
there
a 
few
more
lines

and if you apply for more:

/* ------------- BBBBBBBB ------------- */
more
things
/* ------------- CCCCCCCC ------------- */
there
a 
few
more
lines

based on @batMan solution

command line solution:

tr '\n' ';' < file | grep -Po '\/\*.*?(?=\/\*)' | grep lines | tr ';' '\n'

its output:

/* ------------- AAAAAAAA ------------- */
some
lines 
here

/* ------------- CCCCCCCC ------------- */
there
a 
few
more
lines

Upvotes: 1

James Brown
James Brown

Reputation: 37404

Another one in awk:

$ awk '
function dump() {         # define a function to avoid duplicate code in END
    if(b~/lines/)         # if buffer has "lines" in it
        print b           # output and ...
    b="" }                # reset buffer
/^\/\*/ { dump() }        # at the start of a new block dump existing buffer
{ b=b (b==""?"":ORS) $0 } # gather buffer
END{ dump() }             # dump the last buffer also
' file
/* ------------- AAAAAAAA ------------- */
some
lines 
here
/* ------------- CCCCCCCC ------------- */
there
a 
few
more
lines

Upvotes: 2

Related Questions