Reputation: 597
I have a data file and a file containing a list of positions and I want to generate a file from chunks of the data file. Example:
$ cat data
abcdefghijkl
$ cat positions
0,2
5,8
$ cutter positions data
abcfghi
Is there a (linux) shell command that works like my hypothetical "cutter"? The particular format for "positions" is not important. We can assume that the chunks specified in "positions" are in increasing order and do not overlap. There might be an additional "cutter" mode where the positions count lines not bytes.
I could implement such a program myself easily but I have the gut feeling that such a program already exist.
Upvotes: 3
Views: 270
Reputation: 85795
This can be done with cut
as Barton Chittenden points out with the addition of command substitution:
$ cut -c $(cat positions) data
abcfghi
The particular format for "positions" is not important.
I made the format of positions as expected by cut
so no extra processing was required.
$ cat data
abcdefghijkl
$ cat positions
1-3,6-9
You can turn this into the cutter
command by adding a function in your ~/.bashrc
file
function cutter ()
{
$ cut -c $(cat "$1") "$2"
}
Run source ~/.bashrc
then you can use cutter
as required:
$ cutter positions data
abcfghi
Use redirection to store the output in a newfile
:
$ cut -c $(cat positions) data > newfile
$ cutter positions data > newfile
Upvotes: 2
Reputation: 4416
cut -c
will allow you to specify fixed width columns, which seems to be what you're looking for:
$ echo "abcdefghijkl" | cut -c 1-3,6-9
abcfghi
Note that the character positions start at 1 rather than 0. Individual columns may be specified using commas, e.g. cut -c 1,3,5,7
, or ranges can be specified using a dash: cut -c 2-8
Upvotes: 3
Reputation: 246847
Just using bash's substring extraction from parameter expansion, and using the positions
file format as given:
data=$(< data) # read the entire file into a variable
while IFS=, read start stop; do
printf "%s" "${data:$start:((stop-start+1))}"
done < positions
echo
outputs
abcfghi
If your data file spans multiple lines, you will have to take care with the positions file to account for the newline characters.
This method does not require you to read the data file into memory:
#!/bin/bash
exec 3<data
exec 4<positions
pos=0
while IFS=, read start stop <&4; do
((nskip = start - pos))
((nkeep = stop - start + 1))
((pos += nskip + nkeep))
((nskip > 0)) && read -N $nskip <&3
read -N $nkeep <&3
printf "%s" "$REPLY"
done
echo
Upvotes: 4