Reputation: 503
I want to sort a tab separated file through a Python script by calling 'sort' command. If I use this:
subprocess.Popen(["sort", r"-t$'t'", "-k1,2", "input", "-o", "output"]).wait()
I get this error:
sort: multi-character tab `$\'t\''
If I use shell=True
:
subprocess.Popen(["sort", r"-t$'t'", "-k1,2", "input", "-o", "output"], shell=True).wait()
The process just hangs.
I would prefer using the first method, without shell=True
. Any suggestions?
EDIT: The file is huge.
Upvotes: 1
Views: 1710
Reputation: 531075
Python can create a string with a tab; $'\t'
is only necessary when you are working directly in the shell.
subprocess.Popen(["sort", "-t\t", "-k1,2", "input", "-o", "output"]).wait()
Upvotes: 2
Reputation: 110271
subprocess.call(r"sort -t\t -k1,2 input -o output")
Looks cleaner - call
is a higher level function on the subprocess module than "Popen" - and would make your code simpler to read.
Than, probably, while calling an external "sort" may have certain facilities for large files (> the ammout of avaliable memory) - unless you are dealign with those, you are probabley making it wrong.
Unlike shell scripts, Python is self-contained in the sense it can perform most tasks with your data internally instead of passing data through external simple posix programs.
For sorting your file named "input" and haveing the results ready to use in memory, just do:
# read the data into a list, one line per item:
data = open("input", "rt").readlines()
# sort it, splitting the line on tab characters and taking the first two as key:
data.sort(key=lambda line: line.split("\t")[:2]
# and "data" contains a sorted list of your lines
Upvotes: 0