Arthur
Arthur

Reputation: 31

grep-ing for the value of a csv file

I have tasked with pulling certain values out of a very ugly csv file.

The csv is in the following format:

command1=value1, command2=value2, etc etc.

No problem so far I was grep-ing for the command I required and then piping through cut -f 2 -d '=' to return just the value.

The issue I have is one of the fields is text and can have multiple values which are also separated by comma. To add another curve ball if (and only if) one of the values has a space in it the field will be enclosed in double quotes so the value I'm looking to pull could be:

command=value,..
command=value1,value2,..
command="value 1",..
command="value 1, value 2",..

(where .. is other values in the log file OR the end of the line)

I thought I had cracked it by simply pulling the data between two field names using grep -oP '(?<=command1=).*(?= command2)' and then piping that through rev | cut -c 2- | rev.

But I've now found out the order the fields appear aren't consistent so the file could be:

 command1=value1, command3=value3, command2=value2

How can I get the value of command2 when it may or may not be enclosed in double quotes, it may also have commas within it. I'm struggling to see how it may be possible as how will the grep know what is a value break and what is the next field.

Any help gratefully accepted.

Upvotes: 3

Views: 448

Answers (3)

Ed Morton
Ed Morton

Reputation: 203702

idk if it's what you're looking for or not, but given this input file:

$ cat file
command1=value1.1,command2=value2.1,value2.2,command3="value 3.1",command4="value 4.1, value 4.2"

this GNU awk (for the 4th arg to split()) script might be what you want:

$ cat tst.awk
{
    delete(c2v)
    split($0,f,/,?[^=,]+=/,s)
    for (i=1; i in s; i++) {
        gsub(/^,|=$/,"",s[i])
        print "populating command name to value array:", s[i], "->", f[i+1]
        c2v[s[i]] = f[i+1]
    }
    print c2v["command2"]
    print c2v["command4"]
}

$ awk -f tst.awk file
populating command to value: command1 -> value1.1
populating command to value: command2 -> value2.1,value2.2
populating command to value: command3 -> "value 3.1"
populating command to value: command4 -> "value 4.1, value 4.2"
value2.1,value2.2
"value 4.1, value 4.2"

Modify the print statements to suit, it should be obvious...

Upvotes: 0

tripleee
tripleee

Reputation: 189507

In the worst case (say, if , command2= could occur in the quoted value of another key, for example) the only recourse is probably to write a dedicated parser for this pesky format. (Killing the person who came up with it will unfortunately not solve any problems, and may result in new ones. I understand it could be tempting, but don't.)

For a quick and dirty hack, perhaps this is sufficient, though:

grep -oP '(^|, )command2=\K([^,"]+|"[^"]+")'

This will keep the double quotes if the field value is quoted, but that shoud be easy to fix if it's undesired. Moving to a better tool than grep could bring better precision as well, though; here's a sed variant with additional anchoring:

sed -n 's/^\(.*, \)*command2=\(\((^,"]*\)\|"\([^"]*\)"\)\(, .*\)*$/\4\5/p' 

Upvotes: 1

Tamas Rev
Tamas Rev

Reputation: 7166

I would combine grep and sed. Suppose you have this input in example.csv:

command1=value1, command2=value2,
command1=value1, command2="value2, value3"
command1=value1, command3=valu3

Then this command:

 grep 'command2=' example.csv |
  sed -e 's/.*command2=//g' -e 's/^\([^"][^,]*\),.*$/\1/g' -e 's/^"\([^"]*\)".*$/\1/g'

Will give you this:

value2
value2, value3

Explanation:

  • grep find the right lines
  • the first expression in sed (i.e. the firs -e) removes everything before the desired value
  • the second expression deals with the case without quotation mark
  • the third expression deals with the case with quotation mark

Please note that CSV is an extremely complicated format. This regex makes some assumptions, e.g. command2 appears only as key. If this csv is not good enough then I would use a real programming language that has a mature csv library.

Upvotes: 1

Related Questions