Reputation: 5343
I am writing a script and i am running it from the console like this
cat source_text/* | ./mapper.py
and i would like to get the filename of each file reading at the time. Source texts folder contains a bunch of text files whose filename i need to extract as well in my mapper script.
Is that possible?
import sys
import re
import os
# re is for regular expressions
pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*",
re.MULTILINE | re.DOTALL | re.IGNORECASE)
# Read pairs as lines of input from STDIN
for line in sys.stdin:
....
Upvotes: 0
Views: 4648
Reputation: 1294
If you use this instead of cat:
grep -r '' source_text/ | ./mapper.py
The input for mapper.py will be like:
source_text/answers.txt:42
source_text/answers.txt:42
source_text/file1.txt:Hello world
You can then retrieve the filename using:
for line in sys.stdin:
filename, line = line.split(':', 1)
...
However Python is more than capable to iterate over files in a directory and reading them line-by-line, for example:
for filename in os.listdir(path):
for line in open(filename):
...
Upvotes: 1
Reputation: 148890
You cannot do that directly, but fileinput module can help you.
You just have to call you script that way:
./mapper.py source_text/*
And change it that way:
import fileinput
...
# Read pairs as lines of input from STDIN
for line in fileinput.input():
...
Then the name of the file being processed is available as fileinput.filename()
, and you can also have access the the number of the line in current file as fileinput.filelineno()
and still other goodies...
Upvotes: 2
Reputation: 12930
That is not possible. You can modify your program to read directly from the files like this:
import sys
import re
# re is for regular expressions
pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*",
re.MULTILINE | re.DOTALL | re.IGNORECASE)
for filename in sys.argv[1:]:
with open(filename, "rU") as f:
for line in f.readlines():
if pattern.search(line) is not None:
print filename, line,
Then you can call it with:
$ ./grep_files.py source_text/*
Upvotes: 1