Reputation: 5972

capture delimited phrase with awk then remove strings

I have a file with this content:

wewe-wev123-12343s
dsf-sdfs-1238674d
xcc-asdasd-234351g
dfd-sdfs-sdfssdf-2324g
dfgeg-dfgfg-dfsdf-2344G

my desired output is:

wewe-wev123-12343
dsf-sdfs-1238674
xcc-asdasd-234351
dfd-sdfs-sdfssdf-2324
dfgeg-dfgfg-dfsdf-2344

I want just remove strings at the last delimited part at the end.

What I tried is:

awk -F- '{print $3}' input.txt

How can I tell it to remove strings after digits at the end of lines?

Thanks

Upvotes: 0

Answers (5)

John1024

Reputation: 113834

If you just want to remove the strings from the end, sed is a good tool:

$ sed -r 's/[^0-9]+$//' input.txt
wewe-wev123-12343
dsf-sdfs-1238674
xcc-asdasd-234351
dfd-sdfs-sdfssdf-2324
dfgeg-dfgfg-dfsdf-2344

The -r option to sed is just for convenience: it enables the use of extended regex syntax so we don't need so many backslashes. The regex [^0-9]+$ matches any non-numbers at the end of the line. The substitution s/[^0-9]+$// removes all such non-numbers at the end of the line.

If the goal is to print the number in the third field, as in the first version of the question, then:

$ awk -F- '{sub("[a-z]+", "", $3); print $3}' input.txt
12343
1238674
234351

Another variation: What if, as per @BMW's comment, the third field has no numbers but we still want to preserve the field markers and the first two fields in the output? In this case, we would want to stop the letter-removal at the field marker. To achieve this behavior, we only need add a single character addition to the sed command:

sed -r 's/[^-0-9]+$//' input.txt

Upvotes: 2

John B

Reputation: 3646

You could use Bash.

while read line; do
    [[ $line =~ ([a-zA-Z])$ ]] && echo ${line%${BASH_REMATCH[1]}}
done < file

while read line; do
    [[ $line =~ (^[a-zA-Z|-]+[0-9]+) ]] && echo ${BASH_REMATCH[1]}
done < file

Upvotes: 1

BMW

Reputation: 45243

using gnu grep

grep -Po '.*(?=[^0-9]+$)' file

Upvotes: 1

Jotne

Reputation: 41456

Both this awk should do:

awk '{sub(/[[:alpha:]]$/,"")}8' file

awk '{sub(/[^0-9]$/,"")}8' file

Upvotes: 1

Martin Tournoij

Reputation: 27822

How about:

$ cat file | cut -d- -f3 | grep -Eo '^[0-9]+'
12343
1238674
234351

We use cut because that's simpler than awk. And we use grep because that's simpler than sed.

The cut command does the same as your awk command.
grep -E means extended regexps, -o means only print the matching part. We match for 1 or more numbers at the start of the line, so any non-number is ignored (and not printed).

Edit

Your new output is different, we now just need sed:

$ sed -E 's/[a-zA-Z]+$//' file
wewe-wev123-12343
dsf-sdfs-1238674
xcc-asdasd-234351
dfd-sdfs-sdfssdf-2324
dfgeg-dfgfg-dfsdf-2344

-E for extended regexp
s/ for substitute command
[a-zA-Z]+ to match these characters one or more times
$ to anchor the pattern to the end of the line
// to replace with nothing

Upvotes: 1

capture delimited phrase with awk then remove strings

Answers (5)

Related Questions