Search, count and position of the count in a file

Question

I am not an expert with Linux, but looking at different posts in various forums, I have been trying to write a script to match pattern of characters occurring together in a file. My file has approximately 200 million characters (upper and lower case), with about 50 characters per line. I have merged all the lines together to make it one line using

tr -d '
' < input.txt > oneLineInput.txt

This gets all the characters in my file to the same line without spaces.

I am trying to count the number of times the specific characters occur together. For example, in the file below

IamTryingtobuildascriptfortrestingthetyposinmysentence

I am trying to look for the pattern 'tr' that occurs in the sentence. The script I have now is

grep -o -i oneLineInput.txt -e tr | sort | uniq -c

The above script works perfectly fine for a small file, but when I try to run it on my actual file with more than 200 million characters, it takes ages to finish the task (I lost patience and did not check the total time taken).

Is there a way I can optimize the code?

I have also been trying to get the position of the match. For example, in the above example file, 'tr' is starts on 4th and 27th position.

Is it possible to get the position of index as a number in the output.

Thank you

Jotne · Accepted Answer

This awk will show how many tr you have in the oneLineInput.txt

awk -F"[Tt][Rr]" '{print NF-1}' oneLineInput.txt
2

To get the position:

awk -F"[Tt][Rr]" 'BEGIN {print "hit	position"} {for (i=1;i



To get the position: p+1+(a-1)*2

p incremental length of fields

+1 since tr comes after the length of the field.

(a-1)*2 number of hits -1 multiple length of data to search tr = 2 characters.

Search, count and position of the count in a file

Answers (2)

Related Questions