Stefano_g
Stefano_g

Reputation: 321

AWK: Extract string between two different patterns

I need to extract a string contained in a column of my csv.

My file is like this:

col1;col2;col3;cleavage=10-11;
col1;col2;col3;cleavage=1-2;
col1;col2;col3;cleavage=100-101;
col1;col2;col3;none;

So, the delimiter of my file is ";" but in column 4 I want to extract the string between "cleavage=" and a "-". What I did was to print the 2 chars after "cleavage=", but it's not always 2 chars.

I did it this way:

awk -F "\"*;\"*" '{if (match($4,"cleavage=")) print $1";"$2";"$3";"substr($4,RSTART+9,2); else print $1";"$2";"$3";0"}' file

I figured out that the following should be the correct command, but how should I integrate it in the previous one?

awk "/Pattern1/,/Pattern2/ { print }" inputFile

Thanks for help! :)

EDIT: My actual output is

col1;col2;col3;10;
col1;col2;col3;1-;
col1;col2;col3;10;
col1;col2;col3;0;

But what I would like is:

col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

Upvotes: 0

Views: 16560

Answers (3)

123
123

Reputation: 11216

Unclear of the exact format but this works for your example and will work if = and - are in other fields.

GNU awk (for match 3rd arg)

awk '{match($0,/(.*);[^-0-9]*([0-9]*)[^;]*;$/,a);print a[1]";"+a[2]";"}' file

col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

or sed

sed 's/;[^-0-9]*\([0-9]\{1,\}\)[^;]*;$/;\1;/;t;s/[^;]*;$/0;/' file

Upvotes: 1

anubhava
anubhava

Reputation: 784898

You can use this awk with multiple delimiters as field separator:

awk -F '[;=-]' -v OFS=';' '{print $1, $2, $3, ($4 == "cleavage") ? $5 : 0, ""}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

EDIT: In case - or = can be present in fields before $4 then you can use:

awk -F ';' -v OFS=';' '{split($4, a, /[=-]/);
           print $1, $2, $3, (a[1] == "cleavage") ? a[2] : 0, ""}' file
col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

Upvotes: 1

Kent
Kent

Reputation: 195029

I come up with this one liner:

 awk -F';' -v OFS=";" '{sub(/cleavage=/,"",$(NF-1));
                        sub(/-.*/,"",$(NF-1));$(NF-1)+=0}7' file

it gives

col1;col2;col3;10;
col1;col2;col3;1;
col1;col2;col3;100;
col1;col2;col3;0;

Upvotes: 0

Related Questions