Reputation: 321
I need to extract a string contained in a column of my csv.
My file is like this:
col1;col2;col3;cleavage=10-11;
col1;col2;col3;cleavage=1-2;
col1;col2;col3;cleavage=100-101;
col1;col2;col3;none;
So, the delimiter of my file is ";" but in column 4 I want to extract the string between "cleavage=" and a "-". What I did was to print the 2 chars after "cleavage=", but it's not always 2 chars.
I did it this way:
awk -F "\"*;\"*" '{if (match($4,"cleavage=")) print $1";"$2";"$3";"substr($4,RSTART+9,2); else print $1";"$2";"$3";0"}' file
I figured out that the following should be the correct command, but how should I integrate it in the previous one?
awk "/Pattern1/,/Pattern2/ { print }" inputFile
Thanks for help! :)
EDIT: My actual output is
col1;col2;col3;10;
col1;col2;col3;1-;
col1;col2;col3;10;
col1;col2;col3;0;
But what I would like is:
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
Upvotes: 0
Views: 16560
Reputation: 11216
Unclear of the exact format but this works for your example and will work if = and - are in other fields.
GNU awk (for match 3rd arg)
awk '{match($0,/(.*);[^-0-9]*([0-9]*)[^;]*;$/,a);print a[1]";"+a[2]";"}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
or sed
sed 's/;[^-0-9]*\([0-9]\{1,\}\)[^;]*;$/;\1;/;t;s/[^;]*;$/0;/' file
Upvotes: 1
Reputation: 784898
You can use this awk with multiple delimiters as field separator:
awk -F '[;=-]' -v OFS=';' '{print $1, $2, $3, ($4 == "cleavage") ? $5 : 0, ""}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
EDIT: In case -
or =
can be present in fields before $4
then you can use:
awk -F ';' -v OFS=';' '{split($4, a, /[=-]/);
print $1, $2, $3, (a[1] == "cleavage") ? a[2] : 0, ""}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
Upvotes: 1
Reputation: 195029
I come up with this one liner:
awk -F';' -v OFS=";" '{sub(/cleavage=/,"",$(NF-1));
sub(/-.*/,"",$(NF-1));$(NF-1)+=0}7' file
it gives
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;
Upvotes: 0