Reputation: 1639
I have a file that contains the output of another command of the form:
aaaaaaaa (paramA 12.4) param2: 14, some text 25.55
bbbbbb (paramA 5.1) param2: 121, some text2 312.1
I want to pick the values aaaaaaaa, 12.4, 14, 25.55
from first row and similarly bbbbbb, 5.1, 121, 312.1
from row 2 and so on and dump them in a different format (may be csv).
I want to use regular expression in some command (sed, awk, grep etc) and assign the matched patters to say $1
, $2
etc so that I could dump them in the desired format.
What I am not clear is which command to learn for this. While searching around, sed, awk, grep seem to be capable of doing it but I could not quite get a readymade answer. I plan learn each of these commands but what do I start with to solve the problem at hand?
Upvotes: 0
Views: 247
Reputation: 531165
You can do this in bash
:
# Not tested; regex may not be entirely correct.
regex='(.*) +\(paramA (.*)\) +params: (.*), +.* +(.*)'
while IFS= read -r line; do
[[ $line =~ $regex ]] || continue
# Captured groups are:
# ${BASH_REMATCH[1]} - aaaaaaaa
# ${BASH_REMATCH[2]} - 12.4
# ${BASH_REMATCH[3]} - 14
# ${BASH_REMATCH[4]} - 25.55
done < file.txt
However, it will be relatively slow. Using another tool like awk
will probably be more efficient. It all depends, however, on what you actually want to do with the extracted text.
Upvotes: 0
Reputation: 3239
For an input exactly like that, you can use
awk -F' +|)|,' -vOFS=", " '{print $1, $3, $6,$10}' file
which produces
aaaaaaaa, 12.4, 14, 25.55
bbbbbb, 5.1, 121, 312.1
However, that fails if you have more or less than two words in the last field, or if you have more then one word in the others.
Otherwise, you would have to look for numbers and distinguish it from text or you need to better characterize your input (fixed with, tab separated or based on some regex with sed).
Upvotes: 2