Reputation: 375
I have data in the following form:
<id_mytextadded1829>
<text1> <text2> <text3>.
<id_m_abcdef829>
<text4> <text5> <text6>.
<id_mytextadded1829>
<text7> <text2> <text8>.
<id_mytextadded1829>
<text2> <text1> <text9>.
<id_m_abcdef829>
<text11> <text12> <text2>.
Now I want to the number of lines in which <text2>
is present. I know I can do the same using python's regex. But regex would tell me whether a pattern is present in a line or not? On the other hand my requirement is to find a string which is present exactly in the middle of a line. I know sed is good for replacing contents present in a line. But instead of replacing if I only want the number of lines..is it possible to do so using sed.
EDIT:
Sorry I forgot to mention. I want lines where <text2>
occurs in the middle of the line. I dont want lines where <text2>
occurs in the beginning or at the end of the line.
E.g. in the data shown above the number of lines which have <text2>
in the middle are 2 (rather than 4).
Is there some way by which I may achieve the desired count of the number of lines by which I may find out the number of lines which have <text2>
in middle using linux or python
Upvotes: 1
Views: 4332
Reputation: 10039
Where occur (everywhere)
sed -n "/<text2>/ =" filename
if you want in the middle (like write later in comment)
sed -n "/[^ ] \{1,\}<text2> \{1,\}[^ ]/ =" filename
Upvotes: 0
Reputation: 123448
I want lines where
<text2>
occurs in the middle of the line.
You could say:
grep -P '.+<text2>.+' filename
to list the lines containing <text2>
not at the beginning or the end of a line.
In order to get only the count of matches, you could say:
grep -cP '.+<text2>.+' filename
Upvotes: 4
Reputation: 11703
I want lines where occurs in the middle of the line. I dont want lines where occurs in the beginning or at the end of the line.
Try using grep
with -c
grep -c '>.*<text2>.*<' file
Output:
2
Upvotes: 0
Reputation: 41446
Using awk
you can do this:
awk '$2~/text2/ {a++} END {print a}' file
2
It will count all line with text2
in the middle of the line.
Upvotes: 0
Reputation: 21821
You can use grep
for this. For example, this will count number of lines in the file that match the ^123[a-z]+$
pattern:
egrep -c ^123[a-z]+$ file.txt
P.S. I'm not quite sure about the syntax and I don't have the possibility to test it at the moment. Maybe the regex should be quoted.
Edit: the question is a bit tricky since we don't know for sure what your data is and what exactly you're trying to count in it, but it all comes down to correctly formulating a regular expression.
If we assume that <text2>
is an exact sequence of characters that should be present in the middle of the line and should not be present at the beginning and in the end, then this should be the regex you're looking for: ^<text[^2]>.*text2.*<text[^2]>\.$
Upvotes: 1