Matt
Matt

Reputation: 19

How do i split the input into chunks of six entries each using bash?

This is the script which i run to output the raw data of data_tripwire.sh

#!/bin/sh

    LOG=/var/log/syslog-ng/svrs/sec2tes1

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    CBS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.41 |sort|uniq | wc -l`
    echo $CBS >> /home/secmgr/attmrms1/data_tripwire1.sh
done

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    GFS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.31 |sort|uniq | wc -l`
    echo $GFS >> /home/secmgr/attmrms1/data_tripwire1.sh
done

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    HR1=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.10.1 |sort|uniq | wc -l `
    echo $HR1 >> /home/secmgr/attmrms1/data_tripwire1.sh
done


for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    HR2=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.12 |sort|uniq | wc -l`
    echo $HR2 >> /home/secmgr/attmrms1/data_tripwire1.sh
done

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    PAYROLL=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.18 |sort|uniq | wc -l`
    echo $PAYROLL >> /home/secmgr/attmrms1/data_tripwire1.sh

done

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    INCV=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.71 |sort|uniq | wc -l`
    echo $INCV >> /home/secmgr/attmrms1/data_tripwire1.sh
done

data_tripwire.sh

91
58
54
108
52
18
8
81
103
110
129
137
84
15
14
18
11
17
12
6
1
28
6
14
8
8
0
0
28
24
25
23
21
13
9
4
18
17
18
30
13
3

I want to do the first 6 entries(91,58,54,108,52,18) from the output above. Then it will break out of the loop.After that it will continue for the next 6 entries.Then it will break out of the loop again....

The problem now is that it reads all the 42 numbers without breaking out of the loop.

This is the output of the table

Tripwire

Month   CBS     GFS      HR     HR         Payroll   INCV 
        cb2db1  gfs2db1 hr2web1 hrm2db1   hrm2db1a   incv2svr1 
2013-07 85      76      12      28        26          4 
2013-08 58      103     18      6         24         18 
2013-09 54      110     11      14        25         17 
2013-10 108     129     17      8         23         18 
2013-11 52      137     12      8         21         30 
2013-12 18      84      6       0         13         13 
2014-01 8       16      1       0         9           3

The problem now is that it read the total 42 numbers from 85...3 I want to make a loop which run from july till jan for one server.Then it will do the average mean and standard deviation calculation which is already done below. After that done, it will continue the next cycle of 6 numbers for the next server and it will do the same like initial cycle.Assistance is required for the for loops which has break and continue in it or any simpler.

This is my standard deviation calculation

count=0         # Number of data points; global.
SC=3            # Scale to be used by bc. three decimal places.
E_DATAFILE=90   # Data file error

## ----------------- Set data file ---------------------

if [ ! -z "$1" ]  # Specify filename as cmd-line arg?
then
  datafile="$1" #  ASCII text file,
else            #+ one (numerical) data point per line!
  datafile=/home/secmgr/attmrms1/data_tripwire1.sh
fi              #  See example data file, below.

if [ ! -e "$datafile" ]
then
  echo "\""$datafile"\" does not exist!"
  exit $E_DATAFILE
fi

Calculate the mean

arith_mean ()
{
  local rt=0         # Running total.
  local am=0         # Arithmetic mean.
  local ct=0         # Number of data points.

  while read value   # Read one data point at a time.
  do
    rt=$(echo "scale=$SC; $rt + $value" | bc)
    (( ct++ ))
  done

  am=$(echo "scale=$SC; $rt / $ct" | bc)

  echo $am; return $ct   # This function "returns" TWO values!
  #  Caution: This little trick will not work if $ct > 255!
  #  To handle a larger number of data points,
  #+ simply comment out the "return $ct" above.
} <"$datafile"   # Feed in data file.

sd ()
{
  mean1=$1  # Arithmetic mean (passed to function).
  n=$2      # How many data points.
  sum2=0    # Sum of squared differences ("variance").
  avg2=0    # Average of $sum2.

sdev=0    # Standard Deviation.

  while read value   # Read one line at a time.
  do
    diff=$(echo "scale=$SC; $mean1 - $value" | bc)
    # Difference between arith. mean and data point.
    dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
    sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
  done

    avg2=$(echo "scale=$SC; $sum2 / $n" | bc)  # Avg. of sum of squares.
    sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
    echo $sdev                                 # Standard Deviation.

} <"$datafile"   # Rewinds data file.

Showing the output

mean=$(arith_mean); count=$?   # Two returns from function!
std_dev=$(sd $mean $count)

echo
echo "<tr><th>Servers</th><th>"Number of data points in \"$datafile"\"</th> <th>Arithmetic mean (average)</th><th>Standard Deviation</th></tr>" >> $HTML
echo "<tr><td>cb2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>gfs2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hr2web1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1a<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>incv21svr1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML

echo

I want to split the input into chunks of six entries each with the arithmetic mean and the sd of the entries 1..6, then of the entries 7..12, then of 13..18 etc.

This is the output of the table i want.

Tripwire

Month   CBS     GFS      HR     HR         Payroll   INCV 
        cb2db1  gfs2db1 hr2web1 hrm2db1   hrm2db1a   incv2svr1 
2013-07 85      76      12      28        26          4 
2013-08 58      103     18      6         24         18 
2013-09 54      110     11      14        25         17 
2013-10 108     129     17      8         23         18 
2013-11 52      137     12      8         21         30 
2013-12 18      84      6       0         13         13 
2014-01 8       16      1       0         9           3
*Standard
deviation
(7mths)  31.172   35.559    5.248  8.935  5.799    8.580 
* Mean
(7mths) 54.428  94.285   11.142 9.142  20.285   14.714

Upvotes: 0

Views: 571

Answers (3)

alvits
alvits

Reputation: 6758

The functions will now be able to only read 6 items in datafile.

arith_mean ()
{
  local rt=0         # Running total.
  local am=0         # Arithmetic mean.
  local ct=0         # Number of data points.

  while read value   # Read one data point at a time.
  do
    rt=$(echo "scale=$SC; $rt + $value" | bc)
    (( ct++ ))
  done

  am=$(echo "scale=$SC; $rt / $ct" | bc)

  echo $am; return $ct   # This function "returns" TWO values!
  #  Caution: This little trick will not work if $ct > 255!
  #  To handle a larger number of data points,
  #+ simply comment out the "return $ct" above.
} <(awk -v block=$i 'NR > (6* (block - 1)) && NR < (6 * block + 1) {print}' "$datafile")   # Feed in data file.

sd ()
{
  mean1=$1  # Arithmetic mean (passed to function).
  n=$2      # How many data points.
  sum2=0    # Sum of squared differences ("variance").
  avg2=0    # Average of $sum2.

sdev=0    # Standard Deviation.

  while read value   # Read one line at a time.
  do
    diff=$(echo "scale=$SC; $mean1 - $value" | bc)
    # Difference between arith. mean and data point.
    dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
    sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
  done

    avg2=$(echo "scale=$SC; $sum2 / $n" | bc)  # Avg. of sum of squares.
    sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
    echo $sdev                                 # Standard Deviation.

} <(awk -v block=$i 'NR > (6 * (block - 1)) && NR < (6 * block + 1) {print}' "$datafile")   # Rewinds data file.

From main you will need to set your blocks to read.

for((i=1; i <= $(( $(wc -l $datafile | sed 's/[A-Za-z \/]*//g') / 6 )); i++))
do
    mean=$(arith_mean); count=$?   # Two returns from function!
    std_dev=$(sd $mean $count)
done

Of course it is better to move the wc -l outside of the loop for faster execution. But you get the idea.

The syntax error occured between < and ( due to space. There shouldn't be a space between them. Sorry for the typo.

cat <(awk -F: '{print $1}' /etc/passwd) works.

cat < (awk -F: '{print $1}' /etc/passwd) syntax error near unexpected token `('

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246799

paste - - - - - - < data_tripwire.sh | while read -a values; do
    # values is an array with 6 values
    # ${values[0]} .. ${values[5]}
    arith_mean "${values[@]}"
done

This means you have to rewrite your function so they don't use read: change

while read value

to

for value in "$@"

@Matt, yes change both functions to iterate over arguments instead of reading from stdin. Then, you will pass the data file (now called "data_tripwire1.sh" (terrible file extension for data, use .txt or .dat)) into paste to reformat the data so that the first 6 values now form the first row. Read the line into the array values (using read -a values) and invoke the functions :

arith_mean () {
    local sum=$(IFS=+; echo "$*")
    echo "scale=$SC; ($sum)/$#" | bc
}
sd () {
    local mean=$1
    shift
    local sum2=0
    for i in "$@"; do
        sum2=$(echo "scale=$SC; $sum2 + ($mean-$i)^2" | bc)
    done
    echo "scale=$SC; sqrt($sum2/$#)"|bc
}

paste - - - - - - < data_tripwire1.sh | while read -a values; do
    mean=$(arith_mean "${values[@]}")
    sd=$(sd $mean "${values[@]}")
    echo "${values[@]} $mean $sd"
done | column -t
91  58  54   108  52   18   63.500  29.038
8   81  103  110  129  137  94.666  42.765
84  15  14   18   11   17   26.500  25.811
12  6   1    28   6    14   11.166  8.648
8   8   0    0    28   24   11.333  10.934
25  23  21   13   9    4    15.833  7.711
18  17  18   30   13   3    16.500  7.973

Note you don't need to return a fancy value from the functions: you know how many points you pass in.

Upvotes: 2

Alfe
Alfe

Reputation: 59426

Based on Glenn's answer I propose this which needs very little changes to the original:

paste - - - - - - < data_tripwire.sh | while read -a values
  do 
    for value in "${values[@]}"
    do
      echo "$value"
    done | arith_mean
    for value in "${values[@]}"
    do
      echo "$value"
    done | sd
  done

You can type (or copy & paste) this code directly in an interactive shell. It should work out of the box. Of course, this is not feasible if you intend to use this often, so you can put that code into a text file, make that executable and call that text file as a shell script. In this case you should add #!/bin/bash as first line in that file.

Credit to Glenn Jackman for the use of paste - - - - - - which is the real solution I'd say.

Upvotes: 0

Related Questions