Edward
Edward

Reputation: 353

Can I delete a field in awk?

This is test.txt:

0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76

If I run awk -F, 'BEGIN{OFS=","}{$2="";print $0}' test.txt the result is:

0x01,,0x93,0x65,0xF8
0x01,,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,,0x00,0x76

The $2 wasn't deleted, it just became empty. I hope, when printing $0, that the result is:

0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76

Upvotes: 21

Views: 1752

Answers (12)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2895

echo '
0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76' | 

sed -e 's/,[^,]*//'

    or

awk 'sub(/,[^,]*/,_)_'

0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76

Upvotes: 0

user2314737
user2314737

Reputation: 29407

You can just pipe to tr -s ,

-s or --squeeze-repeats: replace each sequence of a repeated character that is listed in the last specified array, with a single occurrence of that character

echo "0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76" |
awk -F, 'BEGIN{OFS=","}{$2="";print $0}'|tr -s ,
# Out:
# 0x01,0x93,0x65,0xF8
# 0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
# 0x01,0x00,0x76

Upvotes: 0

Pedro Maimere
Pedro Maimere

Reputation: 214

Using awk in a regex-free way, with the option to choose which line will be deleted:

awk '{ col = 2; n = split($0,arr,","); line = ""; for (i = 1; i <= n; i++) line = line ( i == col ? "" : ( line == "" ? "" : ","  ) arr[i] ); print line }' test.txt

Step by step:

{
col = 2    # defines which column will be deleted
n = split($0,arr,",")    # each line is split into an array
                         # n is the number of elements in the array

line = ""     # this will be the new line

for (i = 1; i <= n; i++)   # roaming through all elements in the array
    line = line ( i == col ? "" : ( line == "" ? "" : "," ) arr[i] )
    # appends a comma (except if line is still empty)
    # and the current array element to the line (except when on the selected column)

print line    # prints line
}

Upvotes: 2

SLePort
SLePort

Reputation: 15461

With GNU sed you can add a number modifier to substitute nth match of non-comma characters followed by comma:

sed -E 's/[^,]*,//2' file

Upvotes: 2

Edward
Edward

Reputation: 353

My solution:

awk -F, '
{
    regex = "^"$1","$2
    sub(regex, $1, $0);
    print $0;
}'

or one line code: awk -F, '{regex="^"$1","$2;sub(regex, $1, $0);print $0;}' test.txt

I found that OFS="," was not necessary

Upvotes: 1

Carlos Pascual
Carlos Pascual

Reputation: 1126

Commenting on the first solution of @RavinderSingh13 using sub() function:

awk 'BEGIN{FS=OFS=","}{$2="";sub(/,,/,",");print $0}' Input_file

The gnu-awk manual: https://www.gnu.org/software/gawk/manual/html_node/Changing-Fields.html

It is important to note that making an assignment to an existing field changes the value of $0 but does not change the value of NF, even when you assign the empty string to a field." (4.4 Changing the Contents of a Field)

So, following the first solution of RavinderSingh13 but without using, in this case,sub() "The field is still there; it just has an empty value, delimited by the two colons":

awk 'BEGIN {FS=OFS=","} {$2="";print $0}' file 
0x01,,0x93,0x65,0xF8
0x01,,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,,0x00,0x76

Upvotes: 1

stack0114106
stack0114106

Reputation: 8781

Another solution:

You can just pipe the output to another sed and squeeze the delimiters.

$ awk -F, 'BEGIN{OFS=","}{$2=""}1 ' edward.txt  | sed 's/,,/,/g'
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
$

Upvotes: 1

Daweo
Daweo

Reputation: 36725

I would do it following way, let file.txt content be:

0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76

then

awk 'BEGIN{FS=",";OFS=""}{for(i=2;i<=NF;i+=1){$i="," $i};$2="";print}' file.txt

output

0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76

Explanation: I set OFS to nothing (empty string), then for 2nd and following column I add , at start. Finally I set what is now comma and value to nothing. Keep in mind this solution would need rework if you wish to remove 1st column.

Upvotes: 0

anubhava
anubhava

Reputation: 785866

All the existing solutions are good though this is actually a tailor made job for cut:

cut -d, -f 1,3- file

0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76

If you want to remove 3rd field then use:

cut -d, -f 1,2,4- file

To remove 4th field use:

cut -d, -f 1-3,5- file

Upvotes: 27

Jonathan Leffler
Jonathan Leffler

Reputation: 754820

It's a bit heavy-handed, but this moves each field after field 2 down a place, and then changes NF so the unwanted field is not present:

$ awk -F, -v OFS=, '{ for (i = 2; i < NF; i++) $i = $(i+1); NF--; print }' test.txt
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01
0x01,0x00,0x76
$

Tested with both GNU Awk 4.1.3 and BSD Awk ("awk version 20070501" on macOS Mojave 10.14.6 — don't ask; it frustrates me too, but sometimes employers are not very good at forward thinking). Setting NF may or may not work on older versions of Awk — I was a little surprised it did work, but the surprise was a pleasant one, for a change.

Upvotes: 6

tripleee
tripleee

Reputation: 189799

If Awk is not an absolute requirement, and the input is indeed as trivial as in your example, sed might be a simpler solution.

sed 's/,[^,]*//' test.txt

This is especially elegant if you want to remove the second field. A more generic approach to remove, the nth field would require you to put in a regex which matches the first n - 1 followed by the nth, then replace that with just the the first n - 1.

So for n = 4 you'd have

sed 's/\([^,]*,[^,]*,[^,]*,\)[^,]*,/\1/' test.txt

or more generally, if your sed dialect understands braces for specifying repetitions

sed 's/\(\([^,]*,\)\{3\}\)[^,]*,/\1/' test.txt

Some sed dialects allow you to lose all those pesky backslashes with an option like -r or -E but again, this is not universally supported or portable.

In case it's not obvious, [^,] matches a single character which is not (newline or) comma; and \1 recalls the text from first parenthesized match (back reference; \2 recalls the second, etc).

Also, this is completely unsuitable for escaped or quoted fields (though I'm not saying it can't be done). Every comma acts as a field separator, no matter what.

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133730

I believe simplest would be to use sub function to replace first occurrence of continuous ,,(which are getting created after you made 2nd field NULL) with single ,. But this assumes that you don't have any commas in between field values.

awk 'BEGIN{FS=OFS=","}{$2="";sub(/,,/,",");print $0}' Input_file

2nd solution: OR you could use match function to catch regex from first comma to next comma's occurrence and get before and after line of matched string.

awk '
match($0,/,[^,]*,/){
  print substr($0,1,RSTART-1)","substr($0,RSTART+RLENGTH)
}' Input_file

Upvotes: 10

Related Questions