Reputation: 23
I would like to find out which column has special characters in a file
For example, I have the data below:
11|abc|ac♠|12
12|aac|be•|2♣
13|cj♦|jkd|32
Desired output:
1|3
2|3|4
3|2
I want the record number along with the column numbers that have special characters.
Upvotes: 2
Views: 2080
Reputation: 113834
You didn't define special character. I will assume that you mean anything outside of the normal ASCII range. Try:
$ awk -F'|' '{r=""; for (i=1;i<=NF;i++)if($i ~ /[^\t -~]/) r=r OFS i; if (r) print NR r} ' OFS='|' File
1|3
2|3|4
3|2
-F'|'
This tells awk to use |
as the field separator for input.
r=""
This initializes r
to an empty string.
for (i=1;i<=NF;i++)if($i ~ /[^\t -~]/) r=r OFS i
This goes through each field on a line and, if it contains a character outside the normal ASCII range, it adds the field number to r
.
In an awk regex, \t
is a tab character and -~
matches any character from blank (ASCII 32) to ~
(ASCII 126). These are what we have defined as "normal" characters. In awk regex, ^
means "not". So, [^\t -~]
matches any character that is not in our list of normal characters.
You are free to add or remove characters from my normal list as your please.
if (r) print NR r}
If, after going through all the fields, r
is nonempty, then print out the record number and the value of r
.
OFS='|'
This tells awk to use |
as the record separator for output.
For those who prefer their commands spread over multiple lines:
awk -F'|' '
{
r=""
for (i=1;i<=NF;i++)
if ($i ~ /[^\t -~]/)
r=r OFS i
if (r)
print NR r
} ' OFS='|' File
Upvotes: 2