Reputation: 375

Counting number of lines which contain a pattern

I have data in the following form:

<id_mytextadded1829>
<text1>    <text2>    <text3>.
<id_m_abcdef829>
<text4>    <text5>    <text6>.
<id_mytextadded1829>
<text7>    <text2>    <text8>.
<id_mytextadded1829>
<text2>    <text1>    <text9>.
<id_m_abcdef829>
<text11>    <text12>    <text2>.

Now I want to the number of lines in which <text2> is present. I know I can do the same using python's regex. But regex would tell me whether a pattern is present in a line or not? On the other hand my requirement is to find a string which is present exactly in the middle of a line. I know sed is good for replacing contents present in a line. But instead of replacing if I only want the number of lines..is it possible to do so using sed.

EDIT: Sorry I forgot to mention. I want lines where <text2> occurs in the middle of the line. I dont want lines where <text2> occurs in the beginning or at the end of the line. E.g. in the data shown above the number of lines which have <text2> in the middle are 2 (rather than 4).

Is there some way by which I may achieve the desired count of the number of lines by which I may find out the number of lines which have <text2> in middle using linux or python

Upvotes: 1

Answers (5)

NeronLeVelu

Reputation: 10039

Where occur (everywhere)

sed -n "/<text2>/ =" filename

if you want in the middle (like write later in comment)

sed -n "/[^ ] \{1,\}<text2> \{1,\}[^ ]/ =" filename

Upvotes: 0

devnull

Reputation: 123448

I want lines where <text2> occurs in the middle of the line.

You could say:

grep -P '.+<text2>.+' filename

to list the lines containing <text2> not at the beginning or the end of a line.

In order to get only the count of matches, you could say:

grep -cP '.+<text2>.+' filename

Upvotes: 4

jkshah

Reputation: 11703

I want lines where occurs in the middle of the line. I dont want lines where occurs in the beginning or at the end of the line.

Try using grep with -c

grep -c '>.*<text2>.*<' file

Output:

Upvotes: 0

Jotne

Reputation: 41446

Using awk you can do this:

awk '$2~/text2/ {a++} END {print a}' file
2

It will count all line with text2 in the middle of the line.

Upvotes: 0

Andrew Logvinov

Reputation: 21821

You can use grep for this. For example, this will count number of lines in the file that match the ^123[a-z]+$ pattern:

egrep -c ^123[a-z]+$ file.txt

P.S. I'm not quite sure about the syntax and I don't have the possibility to test it at the moment. Maybe the regex should be quoted.

Edit: the question is a bit tricky since we don't know for sure what your data is and what exactly you're trying to count in it, but it all comes down to correctly formulating a regular expression.

If we assume that <text2> is an exact sequence of characters that should be present in the middle of the line and should not be present at the beginning and in the end, then this should be the regex you're looking for: ^<text[^2]>.*text2.*<text[^2]>\.$

Upvotes: 1

Counting number of lines which contain a pattern

Answers (5)

Related Questions