Reputation: 61
John, 1234567
Bob, 2839211
Alex, 2817821
Mary, 9371281
I am currently trying to retrieve the first column with the last 4 digits of the second column using sed, so the output should look like this:
John, 4567
Bob, 9211
Alex, 7821
Mary, 1281
This is my command: 's/\(.*,\)\(.*\)//'
, I think that this command matches the first column until the comma and the second column until the end, but I am unsure on how to continue.
Upvotes: 4
Views: 1627
Reputation: 2537
Similar to KamilCuk's answer except uses a POSIX character class and anchors the digits to be removed:
sed 's/, [[:digit:]]\{3\}/, /'
Upvotes: 1
Reputation: 133428
In case you are ok with awk
, could you please try following. Written and tested with shown samples in GNU awk
.
awk 'BEGIN{FS=OFS=", "} {$2=substr($2,length($2)-3)} 1' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS=", " ##Setting FS and OFS to comma space here.
}
{
$2=substr($2,length($2)-3) ##Getting last 4 digits now in 2nd field here.
}
1 ##printing current edited/non-edited line.
' Input_file ##Mentioning Input_file name here.
2nd solution: Adding 1 more solution in case your 2nd column can have mix of digits and other non digits then following may help you.
awk 'BEGIN{FS=OFS=", "} {gsub(/[^0-9]+/,"",$2);$2=substr($2,length($2)-3)} 1' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS=", " ##Setting FS and OFS to comma space here.
}
{
gsub(/[^0-9]+/,"",$2) ##Globally substituting everything apart from digits with NULL in 2nd field.
$2=substr($2,length($2)-3) ##getting last 4 digits now in 2nd field here.
}
1 ##printing current edited/non-edited line.
' Input_file ##Mentioning Input_file name here.
Upvotes: 2
Reputation: 140880
If the file format is just <text only alphanumeric characters>, <number exactly 7 digits>
, you can just remove first 3 digits there are:
sed 's/[0-9][0-9][0-9]//'
Upvotes: 0
Reputation: 52336
Just capture the last four digits of each line and delete any preceding digits:
$ sed 's/[0-9]*\([0-9]\{4\}\)$/\1/' input.txt
John, 4567
Bob, 9211
Alex, 7821
Mary, 1281
If using a version of sed
that supports POSIX Extended Regular Expressions, it can be cleaned up a bit to
sed -E 's/[0-9]*([0-9]{4})$/\1/' input.txt
Upvotes: 2
Reputation: 626689
You can use
sed 's/^\([^,]*\), *[0-9]*\([0-9]\{4\}\).*/\1, \2/' file
See the online demo.
Details
^
- start of string\([^,]*\)
- Group 1: any zero or more chars other than a comma, *
- a comma and zero or more spaces[0-9]*
- zero or more digits\([0-9]\{4\}\)
- Group 2: four digits.*
- the rest of the line\1, \2
- The replacement is: Group 1, ,
, space and Group 2 value.Upvotes: 2