Mojing Liu
Mojing Liu

Reputation: 203

Faster than Grep on Python?

So, the bash command that I'd be normally using in a Bash Script would be something like:

$ cat huge2GBfile.txt | grep -w "pattern1/|pattern2/|pattern3" > out.txt

It will output the lines in huge2GBfile where it has found pattern1,2,3. I was wondering if this is achievable through python. I know that I can use

os.system(cmd) 

But I'd like to know if there is something similar in Python (I am a complete noob) and if it is faster than using cat+grep. Thanks!

Initial thoughts, would something like

for line in f:
     if pattern in line:
          out.write(line)

be faster?

Upvotes: 2

Views: 3994

Answers (1)

vinaut
vinaut

Reputation: 2446

Even with an algorithm that is better than the logic grep uses (as someone already commented they are highly optimised, grep is 30 years old!), there is still the fact that they are utilities written in C, and compiled natively for the system.

Python is an interpreted language, and can be a couple of orders of magnitude slower than native C, so I would argue that the answer is no, there is nothing in python that could be faster.

If you want to process the output of a grep command line by line, an option would be to build your python script similar to a unix command line tool, so that it can read from stdin and write to stdout, so you could use something like :

grep pattern file | python myscript.py

How do you read from stdin in Python?

Upvotes: 4

Related Questions