I am working with large files, and my question here is two-fold. Bash - For testing purposes, I would like to iterate over every file in a given directory, taking the Head of each file (say Head 10000 ), and be left with a cut-down version of each. Either in the same directory or another it doesn't matter a whole lot, though I suppose the same would be preferred. Python3 - How can I do this programmatically? I imagine I need to use the os module ?

linuxbashubuntupython-3.x

Houdini

Reputation: 3542

Take the "Head" of every file in a directory?

I am working with large files, and my question here is two-fold.

Bash - For testing purposes, I would like to iterate over every file in a given directory, taking the Head of each file (say Head 10000), and be left with a cut-down version of each. Either in the same directory or another it doesn't matter a whole lot, though I suppose the same would be preferred.
Python3 - How can I do this programmatically? I imagine I need to use the os module?

Upvotes: 3

Answers (4)

Joseph A. Perrin

Reputation: 1

How about:

ls | xargs -i head {}

or just:

head *

Upvotes: 0

Gilles Quénot

Reputation: 185219

Try this using shell :

for i in *; do
    cp "$i" "$i.tail"
    sed -i '10001,$d' "$i.tail"
done

or simply :

for i in *; do
    sed '10001,$d' "$i" > "$i.tail"
done

or :

for i in *; do
    head -n 1000 "$i" > "$i.tail"
done

For python, see http://docs.python.org/2/library/subprocess.html if you would like to use the shell code.

Upvotes: 5

Wing Tang Wong

Reputation: 792

Bash:

The most straightforward way:

#!/usr/bin/env bash
DEST=/tmp/
for i in *
do
   head -1000 "${i}" > ${DEST}/${i}
done

If you have a large number of files, you can run multiple jobs by generating a list of files, splitting them out, and running the loop against each list.

Python:

Assuming the goal is to not spawn shell sessions to execute external binaries, like 'head', this is how I would go about it.

#!/usr/bin/env python
import os

destination="/tmp/"

for file in os.listdir('.'):
  if os.path.isfile( file ):
    readFileHandle = open(file, "r")
    writeFileHandle = open( destination + file , "w")
    for line in range( 0,1000):
      writeFileHandle.write(readFileHandle.readline())
    writeFileHandle.close()
    readFileHandle.close()

Upvotes: 5

that other guy

Reputation: 123490

To abbreviate all files in the current dir in this way, you can use:

for f in *; do [[ $f != *.small ]] && head -n 10000 "$f" > "$f".small; done

The files will be suffixed with .small.

To do this from python,

import os
os.system('for f in *; do [[ $f != *.small ]] && head -n 10000 "$f" > "$f".small; done')

Upvotes: -1

Take the &quot;Head&quot; of every file in a directory?

Answers (4)

Related Questions

Take the "Head" of every file in a directory?