Little Code
Little Code

Reputation: 1545

Head & tail string in one line - possible?

I want to retrieve the first X and the last Y characters from a string (standard ascii, so no worries about unicode).

I understand that I can do this as seperate actions, i.e. :

FIRST=$(echo foobar | head -c 3)
LAST=$(echo foobar | tail -c 3)
COMBINED= "${FIRST}${LAST}"

But is there a cleaner way to do this ?

I would prefer to use common standard utils (i.e. bash built-ins, sed, awk etc.). At a push, a Perl one-liner is OK, but no Python or anything else.

Upvotes: 3

Views: 1311

Answers (2)

F. Hauri  - Give Up GitHub
F. Hauri - Give Up GitHub

Reputation: 70822

head + tail two answers, regarding -c switch

1. head + tail character based (with -c, reducing strings)

Under , you could

string=foobarbaz
echo ${string::3}${string: -3}
foobaz

But to avoid repetion in case of shorter strings:

if ((${#string}>6));then
    echo ${string::3}${string: -3}
else
    echo $string
fi

Full function

shrinkStr(){
    local sep='..' opt OPTIND OPTARG string varname='' paddstr paddchr=' '
    local -i maxlen=40 lhlen=15 rhlen padd=0
    while getopts 'P:l:m:s:v:p' opt; do
        case $opt in 
            l) lhlen=$OPTARG ;;
            m) maxlen=$OPTARG ;;
            p) padd=1 ;;
            P) paddchr=$OPTARG ;;
            s) sep=$OPTARG ;;
            v) varname=$OPTARG ;;
            *) echo Wrong arg.; return 1 ;;
        esac
    done
    rhlen="maxlen-lhlen-${#sep}"
    ((rhlen<1)) && { echo bad lengths; return 1;}
    shift $((OPTIND-1))
    string="$*"
    if ((${#string}>maxlen)) ;then
        string="${string::lhlen}$sep${string: -rhlen}"
    elif ((${#string}<maxlen)) && ((padd));then
        printf -v paddstr '%*s' $((maxlen-${#string})) ''
        string+=${paddstr// /$paddchr}
    fi
    if [[ $varname ]] ;then
        printf -v "$varname" '%s' "$string"
    else
        echo "$string"
    fi
}

Then

shrinkStr -l 4 -m 10 Hello world!
Hell..rld!

shrinkStr -l 2 -m 10 Hello world!
He..world!

shrinkStr -l 3 -m 10 -s '+++' Hello world!
Hel+++rld!

This work even with UTF-8 characters:

cnt=1;for str in Généralités Language Théorème Février 'Hello world!';do
    shrinkStr -l5 -m11 -vOutstr -pP_ "$str"
    printf '  %11d:  |%s|\n' $((cnt++)) "$Outstr"
done
            1:  |Généralités|
            2:  |Language___|
            3:  |Théorème___|
            4:  |Février____|
            5:  |Hello..rld!|

cnt=1;for str in Généralités Language Théorème Février 'Hello world!';do
    shrinkStr -l5 -m10 -vOutstr -pP_ "$str"
    printf '  %11d:  |%s|\n' $((cnt++)) "$Outstr"
done
            1:  |Génér..tés|
            2:  |Language__|
            3:  |Théorème__|
            4:  |Février___|
            5:  |Hello..ld!|

2. head + tail lines based (without -c, reducing files)

By using only one fork to sed.

Here is a little function I wrote for this:

headTail() {
    local hln=${1:-10} tln=${2:-10} str;
    printf -v str '%*s' $((tln-1)) '';
    sed -ne "1,${hln}{p;\$q};$((hln+1)){${str// /\$!N;}};:a;\$!{N;D;ba};p"
}

Usage:

headTail <head lines> <tail lines>

Both argument default are 10.

In practice:

headTail 3 4 < <(seq 1 1000)
1
2
3
997
998
999
1000

Seem correct. Testing border case (where number of line are smaller than requested):

headTail 1 9 < <(seq 1 3)
1
2
3
headTail 9 1 < <(seq 1 3)
1
2
3

Taking more lines: (I will take 100 first and 100 last lines, but print only 2 Top lines, 4 Middle lines and 2 Bottom lines of headTail's output.):

headTail 100 100 < <(seq 1 2000)|sed -ne '1,2s/^/T /p;99,102s/^/M /p;199,$s/^/B /p'
T 1
T 2
M 99
M 100
M 1901
M 1902
B 1999
B 2000

BUG (limit): Don't use this with 0 as argument!

headTail 0 3 < <(seq 1 2000) 
1
1998
1999
2000
headTail 3 0 < <(seq 1 2000) 
1
2
3
1999
2000

BUG (limit): because of max line length:

headTail 4 32762 <<<Foo\ bar
bash: /bin/sed: Argument list too long

For both this to be supported, function would become:

head + tail lines, using one fork to sed

headTail() {
    local hln=${1:-10} tln=${2:-10} str sedcmd=''
    ((hln>0)) && sedcmd+="1,${hln}{p;\$q};"
    if ((tln>0)) ;then
        printf -v str '%*s' $((tln-1)) ''
        sedcmd+="$((hln+1)){${str// /\$!N;}};:a;\$!{N;D;ba};p;"
    fi
    sed -nf <(echo "$sedcmd")
}

Then

headTail 3 4 < <(seq 1 1000) |xargs
1 2 3 997 998 999 1000
headTail 3 0 < <(seq 1 1000) |xargs
1 2 3
headTail 0 4 < <(seq 1 1000) |xargs
997 998 999 1000

for i in {6..9};do printf " %3d: " $i;headTail 3 4 < <(seq 1 $i) |xargs; done
   6: 1 2 3 4 5 6
   7: 1 2 3 4 5 6 7
   8: 1 2 3 5 6 7 8
   9: 1 2 3 6 7 8 9

Stronger test: With bigger numbers: Reading 500'000 first and 500'000 last lines from an input of 3'000'000 lines:

headTail 500000 500000 < <(seq 1 3000000) | sed -ne '499999,500002p'
499999
500000
2500001
2500002

headTail 5000000 5000000 < <(seq 1 30000000) | sed -ne '4999999,5000002p'
4999999
5000000
25000001
25000002

This is as simple this is robust!

Of course, by nature, sedcould support widely bigger file than 3'000'000 of lines! And clearly, having to take only first 1'000 lines AND last 1'000 line seem not to be a regular kind of need.

Upvotes: 4

choroba
choroba

Reputation: 241898

$ perl -E '($s, $x, $y) = @ARGV; substr $s, $x, -$y, ""; say $s' abcdefgh 2 3
abfgh

The four argument variant of substr replaces the given portion of the string with the last argument. Here, we replace from position $x to position -$y (negative numbers count from the end of the string), and use an empty string as replacement, i.e. we remove the middle part.

Upvotes: 0

Related Questions