keeer
keeer

Reputation: 863

Parsing CSV records when a value is multiline

Source file looks like this:

"google.com", "vuln_example1
vuln_example2
vuln_example3"
"facebook.com", "vuln_example2"
"reddit.com", "stupidly_long_vuln_name1"
"stackoverflow.com", ""

I've been trying to get the output to be something like this but the line breaks seem to cause me no end of problems. I'm using a "while read line" job to do this because I do some processing on the columns (e.g Vulnerability count and url in this example). This is output into a jenkins job (yuk).

The basic summary of the problem is getting the linebreaks in the csv to be output into the third column while retaining the table structure. I've got a sort of weird example of the desired output below.

||hostname         ||Vulnerability count|| Vulnerability list    || URL                       ||
|google.com        |3                   |vuln_example1            |http://cve.com/vuln_example1|
|                  |                    |vuln_example2            |http://cve.com/vuln_example2|
|                  |                    |vuln_example3            |http://cve.com/vuln_example3|
|facebook.com      |1                   |vuln_example2            |http://cve.com/vuln_example2|
|reddit.com        |1                   |stupidly_long_vuln_name1 |http://cve.com/stupidly_long_vuln_name1|
|stackoverflow.com |0                   |                         ||

Looking at this... I've got a feeling it might be easier by showing some code and example output.

Upvotes: 1

Views: 971

Answers (1)

Amessihel
Amessihel

Reputation: 6394

Parsing your input with the command line below makes the problem easier (I'm assuming the inputs are correct):

perl -0777 -pe 's/([^"])\s*\n/\1 /g ; s/[",]//g'  < sample.txt

This line invokes Perl to perform two regex substitutions:

  • s/([^"])\s*\n/\1 /g: This substitution removes an end of line if it doesn't terminate by a quote " (i.e. if a host entry, with all vulnerabilities isn't yet complete).
  • s/[",]//g removes all quotes and commas remaining.

For each host entry like this one:

"google.com", "vuln_example1
vuln_example2
vuln_example3"

You'll get:

google.com vuln_example1 vuln_example2 vuln_example3

Then you can assume for each line, you have an host and a set of vulnerabilities.

The given example below stores vulnerabilities in an array and loop through it, formatting and printing each line:

# Replace this by your custom function
# to get an URL for a given vulnerability
function get_vuln_url () {
    # This just displays a random url for an non-empty arg 
    [[ -z "$1" ]] || echo "http://host/$1.htm"
}

# Format your line (see printf help)
function print_row () {
    printf "%-20s|%5s|%-30s|%s\n" "$@"
}

# The perl line reformat 
perl -0777 -pe 's/([^"])\s*\n/\1 /g ; s/[",]//g'  < sample.txt |
    while read -r line ; do
        arr=(${line})
        print_row "${arr[0]}" "$((${#arr[@]} - 1))" "${arr[1]}" "$(get_vuln_url  ${arr[1]})"
        #echo -e "${arr[0]}\t|$vul_count\t|${arr[1]}\t|$(get_vuln_url  ${arr[1]})"
        for v in "${arr[@]:2}" ; do
            print_row " " " " "$v" "$(get_vuln_url  ${arr[1]})"
        done
    done

Output:

google.com          |    3|vuln_example1                 |http://host/vuln_example1.htm
                    |     |vuln_example2                 |http://host/vuln_example1.htm
                    |     |vuln_example3                 |http://host/vuln_example1.htm
facebook.com        |    1|vuln_example2                 |http://host/vuln_example2.htm
reddit.com          |    1|stupidly_long_vuln_name1      |http://host/stupidly_long_vuln_name1.htm
stackoverflow.com   |    0|                              |

Update. If you don't have Perl, and if your file doesn't have tabulations, you can use this command as a workaround instead:

tr '\n' '\t' < sample.txt | sed -r -e 's/([^"])\s*\t/\1 /g' -e 's/[",]//g'  -e 's/\t/\n/g'
  • tr '\n' '\t' replaces all ends of line by tabulations
  • sed part acts like Perl line, except it deals with tabulations instead of ends of line and restores tabulations back to ends of line.

Upvotes: 2

Related Questions