Lemming
Lemming

Reputation: 597

A shell command for composing a file from chunks of another file

I have a data file and a file containing a list of positions and I want to generate a file from chunks of the data file. Example:

$ cat data
abcdefghijkl
$ cat positions
0,2
5,8
$ cutter positions data
abcfghi

Is there a (linux) shell command that works like my hypothetical "cutter"? The particular format for "positions" is not important. We can assume that the chunks specified in "positions" are in increasing order and do not overlap. There might be an additional "cutter" mode where the positions count lines not bytes.

I could implement such a program myself easily but I have the gut feeling that such a program already exist.

Upvotes: 3

Views: 270

Answers (3)

Chris Seymour
Chris Seymour

Reputation: 85795

This can be done with cut as Barton Chittenden points out with the addition of command substitution:

$ cut -c $(cat positions) data
abcfghi

The particular format for "positions" is not important.

I made the format of positions as expected by cut so no extra processing was required.

$ cat data
abcdefghijkl

$ cat positions
1-3,6-9

You can turn this into the cutter command by adding a function in your ~/.bashrc file

function cutter ()
{
     $ cut -c $(cat "$1") "$2"
}

Run source ~/.bashrc then you can use cutter as required:

$ cutter positions data
abcfghi

Use redirection to store the output in a newfile:

$ cut -c $(cat positions) data > newfile

$ cutter positions data > newfile

Upvotes: 2

Barton Chittenden
Barton Chittenden

Reputation: 4416

cut -c will allow you to specify fixed width columns, which seems to be what you're looking for:

$ echo "abcdefghijkl" | cut -c 1-3,6-9
abcfghi

Note that the character positions start at 1 rather than 0. Individual columns may be specified using commas, e.g. cut -c 1,3,5,7, or ranges can be specified using a dash: cut -c 2-8

Upvotes: 3

glenn jackman
glenn jackman

Reputation: 246847

Just using bash's substring extraction from parameter expansion, and using the positions file format as given:

data=$(< data)    # read the entire file into a variable
while IFS=, read start stop; do
    printf "%s" "${data:$start:((stop-start+1))}"
done < positions
echo

outputs

abcfghi

If your data file spans multiple lines, you will have to take care with the positions file to account for the newline characters.

This method does not require you to read the data file into memory:

#!/bin/bash
exec 3<data
exec 4<positions
pos=0
while IFS=, read start stop <&4; do
    ((nskip = start - pos))
    ((nkeep = stop - start + 1))
    ((pos += nskip + nkeep))
    ((nskip > 0)) && read -N $nskip <&3
    read -N $nkeep <&3
    printf "%s" "$REPLY"
done
echo

Upvotes: 4

Related Questions