Shell script Multithreading running

Question

I have shell script for split xml files. but have one million xml files in Customer environment。the script running slow。could run Multithreading mode ?

Thanks!

my shell script:

#!/bin/sh
File=/home/spark/PktLog
count=0
startLine=(`sed -n -e '/?xml version="1.0" encoding/=' $File`)
fileEnd=`sed -n '$=' $File`
endLine=(`echo ${startLine[*]} | awk -v a=$fileEnd '{for(i=2;i<=NF;i++) printf("%d ",$i-1);print a}'`)

let maxIndex=${#startLine[@]}-1

for n in `seq 0 $maxIndex`

do
    sed -n "${startLine[$n]},${endLine[$n]}p" $File >result_${n}.xml
done

echo $startLine[@]`enter code here`

that other guy · Accepted Answer

Your method is very slow because it reads the input file many times.

Instead of trying to make it faster with multithreading, you should rewrite the script to only read the input file one time.

Here is an example input file:

$ cat testfile

Here is an awk command that reads the file one time, and writes each document to a separate file:

$ awk 'BEGIN { file="/dev/null"; n=0; }
       /xml version="1.0" encoding/ {
          close(file); 
          file="file" ++n ".xml"; 
       }
       {print > file;}' testfile

Here is the result:

$ cat file1.xml


  


$ cat file2.xml

This is much faster:

$ grep -c 'xml version' PktLog
3000

$ time ./yourscript    
real    0m9.791s
user    0m6.849s
sys     0m2.660s

$ time ./thisscript
real    0m0.248s
user    0m0.130s
sys     0m0.107s

Shell script Multithreading running

Answers (1)

Related Questions