yeniv
yeniv

Reputation: 1639

Extracting multiple substrings from a string in shell script

I have a file that contains the output of another command of the form:

aaaaaaaa   (paramA 12.4)   param2: 14,   some text   25.55
bbbbbb    (paramA 5.1)   param2: 121,   some text2    312.1

I want to pick the values aaaaaaaa, 12.4, 14, 25.55 from first row and similarly bbbbbb, 5.1, 121, 312.1 from row 2 and so on and dump them in a different format (may be csv).

I want to use regular expression in some command (sed, awk, grep etc) and assign the matched patters to say $1, $2 etc so that I could dump them in the desired format.

What I am not clear is which command to learn for this. While searching around, sed, awk, grep seem to be capable of doing it but I could not quite get a readymade answer. I plan learn each of these commands but what do I start with to solve the problem at hand?

Upvotes: 0

Views: 247

Answers (2)

chepner
chepner

Reputation: 531165

You can do this in bash:

# Not tested; regex may not be entirely correct.
regex='(.*) +\(paramA (.*)\) +params: (.*), +.* +(.*)'
while IFS= read -r line; do
    [[ $line =~ $regex ]] || continue
    # Captured groups are:
    # ${BASH_REMATCH[1]} - aaaaaaaa
    # ${BASH_REMATCH[2]} - 12.4
    # ${BASH_REMATCH[3]} - 14
    # ${BASH_REMATCH[4]} - 25.55
done < file.txt

However, it will be relatively slow. Using another tool like awk will probably be more efficient. It all depends, however, on what you actually want to do with the extracted text.

Upvotes: 0

martin
martin

Reputation: 3239

For an input exactly like that, you can use

awk -F' +|)|,' -vOFS=", " '{print $1, $3, $6,$10}' file

which produces

aaaaaaaa, 12.4, 14, 25.55
bbbbbb, 5.1, 121, 312.1

However, that fails if you have more or less than two words in the last field, or if you have more then one word in the others.

Otherwise, you would have to look for numbers and distinguish it from text or you need to better characterize your input (fixed with, tab separated or based on some regex with sed).

Upvotes: 2

Related Questions