user702432
user702432

Reputation: 12178

Search file for characters excluding a set of characters

I have a text file with 1.3million rows and 258 columns delimited by semicolons (;). How can I search for what characters are in the file, excluding letters of the alphabet (both upper and lower case), semicolon (;), quote (') and double quote (")? Ideally the results should be in a non-duplicated list.

Upvotes: 0

Views: 85

Answers (2)

Satish
Satish

Reputation: 721

you can use grep -v command and pipe it to sort and then to uniq.

Upvotes: 0

Diomidis Spinellis
Diomidis Spinellis

Reputation: 19345

Use the following pipeline

# Remove the characters you want to exclude
tr -d 'A-Za-z;"'\' <file |
# One character on each line
sed 's/\(.\)/\1\
/g' | 
# Remove duplicates
sort -u

Example

echo '2343abc34;ABC;;@$%"' | 
tr -d 'A-Za-z;"'\' |
sed 's/\(.\)/\1\
/g' | 
sort -u

$
%
2
3
4
@

Upvotes: 2

Related Questions