Reputation: 467
I have about few hundred CSV files. These CSV files have different definitions and I don't want to manually unite all the CSV files into one format.
I want to get two different things from the files - A and B, and I can match both of them with regex. I want to match both of them at once - so only rows with both things will be printed. I know how to do that, and I've seen many SO posts answering how to do it.
But I don't know how to print just A B
without rest of the line. I don't know in which order or in which columns will be the two things, so I don't know how (or if I even can) use awk.
Example:
(match A[0-9], B[0-9])
A0 B0 C0
B1 C1 D1
E2 C2 A2
C3 F3 F3
B4 F4 A4
Result:
A0 B0
A4 B4
Upvotes: 0
Views: 98
Reputation: 133770
1st Solution: using match
function of awk
. It will give output in order from letter A to B as per OP's shown examples.
awk '
match($0,/A[0-9]+/){
val=substr($0,RSTART,RLENGTH)
if(val && match($0,/B[0-9]+/)){
print val,substr($0,RSTART,RLENGTH)
}
}' Input_file
2nd Solution: This solution will not care of letter A and B, so in which order they are coming into line they will appear in same order.
awk '
{
for(i=1;i<=NF;i++){
if($i ~ /A[0-9]+/ || $i ~ /B[0-9]+/){
val=val?val OFS $i:$i
}
}
if(val ~ /A[0-9]+/ && val ~ /B[0-9]+/){
print val
}
val=""
}
END{
if(val ~ /A[0-9]+/ && val ~ /B[0-9]+/){
print val
}
}' Input_file
3rd Solution: considering that you need them in order of A to B in output then following may help.
awk '
{
for(i=1;i<=NF;i++){
line=$i
sub(/[0-9]+/,"",line)
if($i ~ /A[0-9]+/ || $i ~ /B[0-9]+/){
array[tolower(line)]=$i
}
}
if(array["a"] ~ /A[0-9]+/ && array["b"] ~ /B[0-9]+/){
print array["a"],array["b"]
}
delete array
}
END{
if(array["a"] ~ /A[0-9]+/ && array["b"] ~ /B[0-9]+/){
print array["a"],array["b"]
}
}' Input_file
NOTE: Adding information from man awk
documentation about used functions eg--> match
, tolower
, RSTART
and RLENGTH
match(s, r [, a]) Returns the position in s where the regular expression r occurs, or 0 if r is not present, and sets the values of RSTART and RLENGTH. Note that the argument order is the same as for the ~ operator: str ~ re. If array a is provided, a is cleared and then elements 1 through n are filled with the portions of s that match the corresponding parenthesized subexpression in r. The 0’th element of a contains the portion of s matched by the entire regular expression r. Sub- scripts a[n, "start"], and a[n, "length"] provide the starting index in the string and length respectively, of each matching substring.
RSTART The index of the first character matched by match(); 0 if no match. (This implies that character indices start at one.)
RLENGTH The length of the string matched by match(); -1 if no match.
tolower(str) Returns a copy of the string str, with all the upper-case characters in str translated to their corresponding lower-case counterparts. Non-alphabetic characters are left unchanged.
Upvotes: 3
Reputation: 50815
But I don't know how to print just A B without rest of the line.
Well, you need to remove everything but A and B from matching lines and force awk to recompute fields ($1=$1
does that).
awk '/A[0-9]/ && /B[0-9]/ { gsub(/[^AB][0-9]/,""); $1=$1; print }' file
Upvotes: 1