Reputation: 203
So, the bash command that I'd be normally using in a Bash Script would be something like:
$ cat huge2GBfile.txt | grep -w "pattern1/|pattern2/|pattern3" > out.txt
It will output the lines in huge2GBfile where it has found pattern1,2,3. I was wondering if this is achievable through python. I know that I can use
os.system(cmd)
But I'd like to know if there is something similar in Python (I am a complete noob) and if it is faster than using cat+grep. Thanks!
Initial thoughts, would something like
for line in f:
if pattern in line:
out.write(line)
be faster?
Upvotes: 2
Views: 3994
Reputation: 2446
Even with an algorithm that is better than the logic grep uses (as someone already commented they are highly optimised, grep is 30 years old!), there is still the fact that they are utilities written in C, and compiled natively for the system.
Python is an interpreted language, and can be a couple of orders of magnitude slower than native C, so I would argue that the answer is no, there is nothing in python that could be faster.
If you want to process the output of a grep command line by line, an option would be to build your python script similar to a unix command line tool, so that it can read from stdin and write to stdout, so you could use something like :
grep pattern file | python myscript.py
How do you read from stdin in Python?
Upvotes: 4