Reputation: 3542
I am working with large files, and my question here is two-fold.
Bash - For testing purposes, I would like to iterate over every file in a given directory, taking the Head
of each file (say Head
10000
), and be left with a cut-down version of each. Either in the
same directory or another it doesn't matter a whole lot, though I
suppose the same would be preferred.
Python3 - How can I do this programmatically? I imagine I need to use the os module?
Upvotes: 3
Views: 3406
Reputation: 185219
Try this using shell :
for i in *; do
cp "$i" "$i.tail"
sed -i '10001,$d' "$i.tail"
done
or simply :
for i in *; do
sed '10001,$d' "$i" > "$i.tail"
done
or :
for i in *; do
head -n 1000 "$i" > "$i.tail"
done
For python, see http://docs.python.org/2/library/subprocess.html if you would like to use the shell code.
Upvotes: 5
Reputation: 792
Bash:
The most straightforward way:
#!/usr/bin/env bash
DEST=/tmp/
for i in *
do
head -1000 "${i}" > ${DEST}/${i}
done
If you have a large number of files, you can run multiple jobs by generating a list of files, splitting them out, and running the loop against each list.
Python:
Assuming the goal is to not spawn shell sessions to execute external binaries, like 'head', this is how I would go about it.
#!/usr/bin/env python
import os
destination="/tmp/"
for file in os.listdir('.'):
if os.path.isfile( file ):
readFileHandle = open(file, "r")
writeFileHandle = open( destination + file , "w")
for line in range( 0,1000):
writeFileHandle.write(readFileHandle.readline())
writeFileHandle.close()
readFileHandle.close()
Upvotes: 5
Reputation: 123490
To abbreviate all files in the current dir in this way, you can use:
for f in *; do [[ $f != *.small ]] && head -n 10000 "$f" > "$f".small; done
The files will be suffixed with .small
.
To do this from python,
import os
os.system('for f in *; do [[ $f != *.small ]] && head -n 10000 "$f" > "$f".small; done')
Upvotes: -1