Petros Kyriakou
Petros Kyriakou

Reputation: 5343

How to get filename from stdin

I am writing a script and i am running it from the console like this

cat source_text/* | ./mapper.py

and i would like to get the filename of each file reading at the time. Source texts folder contains a bunch of text files whose filename i need to extract as well in my mapper script.

Is that possible?

import sys
import re
import os


# re is for regular expressions
pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*",
                     re.MULTILINE | re.DOTALL | re.IGNORECASE)


# Read pairs as lines of input from STDIN
for line in sys.stdin:
    ....

Upvotes: 0

Views: 4648

Answers (3)

Elijan9
Elijan9

Reputation: 1294

If you use this instead of cat:

grep -r '' source_text/ | ./mapper.py

The input for mapper.py will be like:

source_text/answers.txt:42
source_text/answers.txt:42
source_text/file1.txt:Hello world

You can then retrieve the filename using:

for line in sys.stdin:
    filename, line = line.split(':', 1)
    ...

However Python is more than capable to iterate over files in a directory and reading them line-by-line, for example:

for filename in os.listdir(path):
    for line in open(filename):
        ...

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148890

You cannot do that directly, but fileinput module can help you.

You just have to call you script that way:

./mapper.py source_text/*

And change it that way:

import fileinput
...

# Read pairs as lines of input from STDIN
for line in fileinput.input():
    ...

Then the name of the file being processed is available as fileinput.filename(), and you can also have access the the number of the line in current file as fileinput.filelineno() and still other goodies...

Upvotes: 2

hochl
hochl

Reputation: 12930

That is not possible. You can modify your program to read directly from the files like this:

import sys
import re

# re is for regular expressions
pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*",
                     re.MULTILINE | re.DOTALL | re.IGNORECASE)
for filename in sys.argv[1:]:
    with open(filename, "rU") as f:
        for line in f.readlines():
            if pattern.search(line) is not None:
                print filename, line,

Then you can call it with:

$ ./grep_files.py source_text/*

Upvotes: 1

Related Questions