Reputation: 1545
I want to retrieve the first X and the last Y characters from a string (standard ascii, so no worries about unicode).
I understand that I can do this as seperate actions, i.e. :
FIRST=$(echo foobar | head -c 3)
LAST=$(echo foobar | tail -c 3)
COMBINED= "${FIRST}${LAST}"
But is there a cleaner way to do this ?
I would prefer to use common standard utils (i.e. bash built-ins, sed, awk etc.). At a push, a Perl one-liner is OK, but no Python or anything else.
Upvotes: 3
Views: 1311
Reputation: 70822
head
+ tail
two answers, regarding -c
switchhead
+ tail
character based (with -c
, reducing strings)Under bash, you could
string=foobarbaz
echo ${string::3}${string: -3}
foobaz
But to avoid repetion in case of shorter strings:
if ((${#string}>6));then
echo ${string::3}${string: -3}
else
echo $string
fi
shrinkStr(){
local sep='..' opt OPTIND OPTARG string varname='' paddstr paddchr=' '
local -i maxlen=40 lhlen=15 rhlen padd=0
while getopts 'P:l:m:s:v:p' opt; do
case $opt in
l) lhlen=$OPTARG ;;
m) maxlen=$OPTARG ;;
p) padd=1 ;;
P) paddchr=$OPTARG ;;
s) sep=$OPTARG ;;
v) varname=$OPTARG ;;
*) echo Wrong arg.; return 1 ;;
esac
done
rhlen="maxlen-lhlen-${#sep}"
((rhlen<1)) && { echo bad lengths; return 1;}
shift $((OPTIND-1))
string="$*"
if ((${#string}>maxlen)) ;then
string="${string::lhlen}$sep${string: -rhlen}"
elif ((${#string}<maxlen)) && ((padd));then
printf -v paddstr '%*s' $((maxlen-${#string})) ''
string+=${paddstr// /$paddchr}
fi
if [[ $varname ]] ;then
printf -v "$varname" '%s' "$string"
else
echo "$string"
fi
}
Then
shrinkStr -l 4 -m 10 Hello world!
Hell..rld!
shrinkStr -l 2 -m 10 Hello world!
He..world!
shrinkStr -l 3 -m 10 -s '+++' Hello world!
Hel+++rld!
cnt=1;for str in Généralités Language Théorème Février 'Hello world!';do
shrinkStr -l5 -m11 -vOutstr -pP_ "$str"
printf ' %11d: |%s|\n' $((cnt++)) "$Outstr"
done
1: |Généralités|
2: |Language___|
3: |Théorème___|
4: |Février____|
5: |Hello..rld!|
cnt=1;for str in Généralités Language Théorème Février 'Hello world!';do
shrinkStr -l5 -m10 -vOutstr -pP_ "$str"
printf ' %11d: |%s|\n' $((cnt++)) "$Outstr"
done
1: |Génér..tés|
2: |Language__|
3: |Théorème__|
4: |Février___|
5: |Hello..ld!|
head
+ tail
lines based (without -c
, reducing files)By using only one fork to sed
.
Here is a little function I wrote for this:
headTail() {
local hln=${1:-10} tln=${2:-10} str;
printf -v str '%*s' $((tln-1)) '';
sed -ne "1,${hln}{p;\$q};$((hln+1)){${str// /\$!N;}};:a;\$!{N;D;ba};p"
}
Usage:
headTail <head lines> <tail lines>
Both argument default are 10
.
In practice:
headTail 3 4 < <(seq 1 1000)
1
2
3
997
998
999
1000
Seem correct. Testing border case (where number of line are smaller than requested):
headTail 1 9 < <(seq 1 3)
1
2
3
headTail 9 1 < <(seq 1 3)
1
2
3
Taking more lines: (I will take 100 first and 100 last lines, but print only 2 Top lines, 4 Middle lines and 2 Bottom lines of headTail
's output.):
headTail 100 100 < <(seq 1 2000)|sed -ne '1,2s/^/T /p;99,102s/^/M /p;199,$s/^/B /p'
T 1
T 2
M 99
M 100
M 1901
M 1902
B 1999
B 2000
BUG (limit): Don't use this with 0
as argument!
headTail 0 3 < <(seq 1 2000)
1
1998
1999
2000
headTail 3 0 < <(seq 1 2000)
1
2
3
1999
2000
BUG (limit): because of max line length:
headTail 4 32762 <<<Foo\ bar bash: /bin/sed: Argument list too long
For both this to be supported, function would become:
head
+ tail
lines, using one fork to sed
headTail() {
local hln=${1:-10} tln=${2:-10} str sedcmd=''
((hln>0)) && sedcmd+="1,${hln}{p;\$q};"
if ((tln>0)) ;then
printf -v str '%*s' $((tln-1)) ''
sedcmd+="$((hln+1)){${str// /\$!N;}};:a;\$!{N;D;ba};p;"
fi
sed -nf <(echo "$sedcmd")
}
Then
headTail 3 4 < <(seq 1 1000) |xargs
1 2 3 997 998 999 1000
headTail 3 0 < <(seq 1 1000) |xargs
1 2 3
headTail 0 4 < <(seq 1 1000) |xargs
997 998 999 1000
for i in {6..9};do printf " %3d: " $i;headTail 3 4 < <(seq 1 $i) |xargs; done
6: 1 2 3 4 5 6
7: 1 2 3 4 5 6 7
8: 1 2 3 5 6 7 8
9: 1 2 3 6 7 8 9
Stronger test: With bigger numbers: Reading 500'000 first and 500'000 last lines from an input of 3'000'000 lines:
headTail 500000 500000 < <(seq 1 3000000) | sed -ne '499999,500002p'
499999
500000
2500001
2500002
headTail 5000000 5000000 < <(seq 1 30000000) | sed -ne '4999999,5000002p'
4999999
5000000
25000001
25000002
This is as simple this is robust!
Of course, by nature, sed
could support widely bigger file than 3'000'000
of lines! And clearly, having to take only first 1'000 lines AND last 1'000 line seem not to be a regular kind of need.
Upvotes: 4
Reputation: 241898
$ perl -E '($s, $x, $y) = @ARGV; substr $s, $x, -$y, ""; say $s' abcdefgh 2 3
abfgh
The four argument variant of substr replaces the given portion of the string with the last argument. Here, we replace from position $x to position -$y (negative numbers count from the end of the string), and use an empty string as replacement, i.e. we remove the middle part.
Upvotes: 0