Extracting multiple substrings from a string in shell script

Question

I have a file that contains the output of another command of the form:

aaaaaaaa   (paramA 12.4)   param2: 14,   some text   25.55
bbbbbb    (paramA 5.1)   param2: 121,   some text2    312.1

I want to pick the values aaaaaaaa, 12.4, 14, 25.55 from first row and similarly bbbbbb, 5.1, 121, 312.1 from row 2 and so on and dump them in a different format (may be csv).

I want to use regular expression in some command (sed, awk, grep etc) and assign the matched patters to say $1, $2 etc so that I could dump them in the desired format.

What I am not clear is which command to learn for this. While searching around, sed, awk, grep seem to be capable of doing it but I could not quite get a readymade answer. I plan learn each of these commands but what do I start with to solve the problem at hand?

martin · Accepted Answer

For an input exactly like that, you can use

awk -F' +|)|,' -vOFS=", " '{print $1, $3, $6,$10}' file

which produces

aaaaaaaa, 12.4, 14, 25.55
bbbbbb, 5.1, 121, 312.1

However, that fails if you have more or less than two words in the last field, or if you have more then one word in the others.

Otherwise, you would have to look for numbers and distinguish it from text or you need to better characterize your input (fixed with, tab separated or based on some regex with sed).

Extracting multiple substrings from a string in shell script

Answers (2)

Related Questions