Joseph Ivan Hayhoe
Joseph Ivan Hayhoe

Reputation: 113

While loop on each sed match

I am attempting to parse through email files I have stored on my local workstation. Each file contains a list of hardware orders. Some files may contain multiple lists of hardware in a block starting with Processor: and ending with ExtraIp:. My current script works without issue if the email only contains a single block. The issues arise when the email files contain multiple blocks of data as stated above.

Example issue email:

Processor: Intel Xeon E3-1270 V2 3.5GHZ, Quad Core
RAM: 16GB DDR3 SDRAM
HD1: 2 x SATA Hardware RAID 1 (7,200 rpm)
(+1TB 7200 RPM SATA hard drive)
SSD: No SSD Drive
HD2: SATA Backup Drive
(+1 TB SATA (7,200 rpm))
HD3: No Additional Storage Array
ExtraIp: Public IP Addresses

Processor: Intel Xeon E3-1220 V2 3.1GHZ, Quad Core
RAM: 8GB DDR3 SDRAM
HD1: 2 x SATA Hardware RAID 1 (7,200 rpm)
(+1TB 7200 RPM SATA hard drive)
SSD: No SSD Drive
HD2: No Backup Drive
HD3: No Additional Storage Array
ExtraIp: Public IP Addresses

My script:

#!/bin/bash
find ./email -print0 | while read -d $'\0' file
do
#### Sed and while loop here, with modification to the below lines to read data from the while loop instead of directly from each file ####
#### Example sed command: sed -n "/Processor:/,/ExtraIp:/p" $file ####

    order_date=$(echo $file | awk '{print $11}')
    grep "Processor:" "$file" | cut -d : -f2 | cut -d , -f1 | while read cpu_type
    do
            if [ "$cpu_type" != "" ]; then
                    echo $order_date
                    echo $cpu_type
                    ram_size=$(grep "RAM:" "$file" | cut -d : -f2)
                    if [ "$ram_size" != "" ]; then
                            echo $ram_size
                    fi
                    hd1_type=$(grep "HD1:" "$file" | cut -d : -f2)
                    if [ "$hd1_type" != "" ]; then
                            echo $hd1_type
                    fi
                    hd1_size=$(grep -A1 "HD1:" "$file" | tail -n1)
                    if [ "$hd1_size" != "" ]; then
                            echo $hd1_size
                    fi
                    ssd_type=$(grep "SSD:" "$file" | cut -d : -f2)
                    ssd_type1=$(grep "SSD:" "$file" | cut -d : -f2 | awk '{print $1}')
                    if [ "$ssd_type" != "" ]; then
                            echo $ssd_type
                    fi
                    if [[ "$ssd_type1" != "No"  &&  "$ssd_type1" != "" ]]; then
                            ssd_size=$(grep -A1 "SSD:" "$file" | tail -n1)
                            echo $ssd_size
                    else
                            ssd_size="No SSD"
                            echo $ssd_size
                    fi
                    hd2_type=$(grep "HD2:" "$file" | cut -d : -f2)
                    hd2_type1=$(grep "HD2:" "$file" | cut -d : -f2 | awk '{print $1}')
                    if [ "$hd2_type" != "" ]; then
                            echo $hd2_type
                    fi
                    if [[ "$hd2_type1" != "No"  &&  "$hd2_type1" != "" ]]; then
                            hd2_size=$(grep -A1 "HD2:" "$file" | tail -n1)
                            echo $hd2_size
                    else
                            hd2_size="No HD2"
                            echo $hd2_size
                    fi
                    hd3_type=$(grep "HD3:" "$file" | cut -d : -f2)
                    hd3_type1=$(grep "HD3:" "$file" | cut -d : -f2 | awk '{print $1}')
                    if [ "$hd3_type" != "" ]; then
                            echo $hd3_type
                    fi
                    if [[ "$hd3_type1" != "No"  &&  "$hd3_type1" != "" ]]; then
                            hd3_size=$(grep -A1 "HD3:" "$file" | tail -n1)
                            echo $hd3_size
                    else
                            hd3_size="No HD3"
                            echo $hd3_size
                    fi
            echo "$order_date,$cpu_type,$ram_size,$hd1_type,$hd1_size,$hd2_type,$hd2_size,$hd3_type,$hd3_size" >> order_list.csv
            fi
    done
done

Expected output:

If the email only contains one block of text I get the correct output:

2014-04-01,Intel Xeon E3-1270 V2 3.5GHZ, 16GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm),(+1TB 7200 RPM SATA hard drive), SATA Backup Drive,(+1 TB SATA (7,200 rpm)), No Additional Storage Array,No HD3

If the email contains multiple blocks of text I get the following output:

2014-04-01,Intel Xeon E3-1270 V2 3.5GHZ, 16GB DDR3 SDRAM
8GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm)
2 x SATA Hardware RAID 1 (7,200 rpm),    (+1TB 7200 RPM SATA hard drive), SATA Backup Drive
No Backup Drive,    HD3: No Additional Storage Array, No Additional Storage Array
No Additional Storage Array,    ExtraIp: Public IP Addresses
2014-04-01,Intel Xeon E3-1220 V2 3.1GHZ, 16GB DDR3 SDRAM
8GB DDR3 SDRAM, 2 x SATA Hardware RAID 1 (7,200 rpm)
2 x SATA Hardware RAID 1 (7,200 rpm),    (+1TB 7200 RPM SATA hard drive), SATA Backup Drive
No Backup Drive,    HD3: No Additional Storage Array, No Additional Storage Array
No Additional Storage Array,    ExtraIp: Public IP Addresses

In the second output the data from both blocks of text is duplicated for each CSV value (Memory and drives). My plan was to include another while loop from a sed command (placed in the space of the above comment in my script) and then modifying each of the commands to read the data from the while loop.

Example sed command to use:

sed -n "/Processor:/,/ExtraIp:/p" $file

Upvotes: 1

Views: 188

Answers (1)

tripleee
tripleee

Reputation: 189487

Your parse script uses grep to extract one field, and when the $file contains two of the same fields, grep extracts them both at the same time.

You would be better off refactoring to do all the parsing in Awk. I am not going to complete it for you, but this should be a good start.

awk 'BEGIN { split("Processor:RAM:HD1:SSD:HD2:HD3", f, /:/) }
    /^Processor:/ { delete a }  # forget any prevous record
    /^(Processor|RAM|HD[123]|SSD):/ { i=$1; sub(/:/,"",i); 
        $1=""; sub(/^ /,""); a[i]=$0 }
    i ~ /^(HD[123]|SSD)$/ && $1 == "No" { a[i] = "No " i; i=""; next }
    i ~ /^(HD[123]|SSD)$/ && !k { k=i; next }  # remember key for two-line entry
    k { a[k] = a[k] "," $0; k=i="" }
    /^ExtraIp: / {s=""; for (i=1; i<=length(f); i++) {
        printf("%s%s", s, a[f[i]]); s="," } printf "\n" }' "$file"

Upvotes: 1

Related Questions