Reputation: 73
How can I extract the beginning alphabetic letters from a string? I want to extract alphabets occurring in the beginning before I hit the first non-alphabetic character.
e.g. If the input string is abcd045tj56 the output should be abcd
Similarly, if the input is jkl657890 the output should be jkl
Can it be done in shell script using awk/sed/cut?
I tried
echo "XYZ123" | awk 'sub(/[[:alpha:]]*/, "")'
But it gives 123 instead of xyz
Then I tried
echo "XYZ123" | awk '{print (/[[:alpha:]]*/)}'
but it gives 1
I want the answer to be XYZ
Upvotes: 4
Views: 217
Reputation: 106995
You can use bash's parameter expansion to remove the first non-alphabet and all characters after it:
s=XYZ123
echo ${s%%[^[:alpha:]]*}
Demo: https://onlinegdb.com/OzjGf53T-
Note that this approach has the performance benefit of avoiding the overhead of spawning a separate process.
Upvotes: 1
Reputation: 163477
Using gnu awk
you can print the first 1 or more alphabetic letters:
echo "XYZ123" | awk 'match($0, /[[:alpha:]]+/, a) {print a[0]}'
Output
XYZ
If there should be at least a single a non alphabetic character following, you can use a capture group and print that value:
echo "XYZ123" | awk 'match($0, /([[:alpha:]]+)[^[:alpha:]]/, a) {print a[1]}'
Upvotes: 1
Reputation: 36680
I tried
echo "XYZ123" | awk 'sub(/[[:alpha:]]*/, "")'
But it gives 123 instead of xyz
You instructed GNU AWK to replace zero-or-more alphabetic characters using empty string, if you wish to do this task using sub
select non-alpha character followed by zero-or-more any characters, namely
echo "XYZ123" | awk '{sub(/[^[:alpha:]].*/, "");print}'
gives output
XYZ
(tested in GNU Awk 5.1.0)
Upvotes: 1
Reputation: 106995
You can use awk with a non-alphabet as the field separator so you can get the leading alphabets by printing the first field:
awk -F'[^[:alpha:]]' '{print $1}'
Demo: https://awk.js.org/?snippet=g7eajb
Upvotes: 1
Reputation: 133680
Converting my comment to an answer here. Using any awk
version.
awk '
match($0,/^[a-zA-Z]+/){
print substr($0,RSTART,RLENGTH)
}
' Input_file
OR:
awk '
match($0, /[^[:alpha:]]/){
print substr($0, 1, RSTART-1)
}
' Input_file
Upvotes: 5
Reputation: 785721
You may use this sed
:
sed 's/[^[:alpha:]].*$//'
This sed
matches a non-alpha character and everything afterwards and substitutes with an empty string.
Examples:
sed 's/[^[:alpha:]].*$//' <<< 'abcd045tj56'
abcd
sed 's/[^[:alpha:]].*$//' <<< 'XYZ123'
XYZ
sed 's/[^[:alpha:]].*$//' <<< 'jkl657890'
jkl
If you want to do this in bash
then:
s='abcd045tj56'
echo "${s/[^[:alpha:]]*}"
abcd
Upvotes: 4
Reputation: 26211
Use grep
:
$ grep -Eo '^[A-Za-z]+' <<<"XYZ123"
to only match alphabetic letters at the beginning of the string.
Upvotes: 2