I am trying to convert this input from file.txt a,b;c^d"e} f;g,h!;i8j- into this output a,b,c,d,e,f,g,h,i,j with awk The best I did so far is awk '$1=$1' FS="[!;^}8-]" OFS="," file.txt how can I escape interpritating " as a special character ? " doesn`t work avoid duplicate ,, in the output and delete the last ,

Reputation: 11

Create a parser script with a delimiter

I am trying to convert this input from file.txt

a,b;c^d"e}
f;g,h!;i8j-

into this output

a,b,c,d,e,f,g,h,i,j

with awk

The best I did so far is

awk '$1=$1' FS="[!;^}8-]" OFS="," file.txt

how can I escape interpritating " as a special character ? " doesn`t work
avoid duplicate ,, in the output and delete the last ,

Upvotes: 1

Answers (9)

Supertech

Reputation: 770

If you are ok with Perl solution, here is an one-liner;

perl -ne '$_ =~ s/[^[:alnum:]]//g; print join(",", split//, $_)'

which outputs:

a,b,c,d,ef,g,h,i,8,j

Simply, you are substituting characters that are not alpha-numeric with nothing.

Upvotes: 0

RARE Kpop Manifesto

Reputation: 2895

 echo "${input_data}" |

mawk 'NF-=_==$NF' FS='[^[:alpha:]]*' OFS=, RS=

a,b,c,d,e,f,g,h,i,j

if there's possibility of leading edge seps, use this instead :

echo ']a['

gawk 'gsub("^,|,$",_,$!(NF=NF))^_' FS='[^[:alpha:]]*' OFS=, RS=

** side note : beware that nawk has an unconventional definition of what it considers [[:alpha:]] :

reparse <[[:alpha:]]+>

cclenter   : in = | . .. |, out = 
   
|ABCDEFGHIJKLMNOPQRSTUVWXYZ
 abcdefghijklmnopqrstuvwxyz
 ªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
   ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ|

Even though locale is set as LANG="en_US.UTF-8", nawk's idea of [[:alpha:]] is neither ASCII nor full Unicode -

something resembling, but not necessarily identical, to a legacy 8-bit locale like ISO-8859-...

Upvotes: 0

Gilles Quénot

Reputation: 185590

KISS:

$ grep -o '[a-z]' file | paste -sd ',' -
a,b,c,d,e,f,g,h,i,j

Should works on most GNU/Linux, even busybox & freeBSD (the - is then mandatory)

Upvotes: 1

James Brown

Reputation: 37464

One in awk (not for all awks, tested successfully in gawk, mawk, busybox awk and Macos awk version 20200816, unsuccessfully in Debian's awk version 20121220 aka original-awk. Limitations in locales as well.)

$ awk -v RS="^$" '{      # read whole file in 
    gsub(/[^a-z]+/,",")  # replace all non lowercase alphabet substrings with a comma
    sub(/,$/,"")         # remove trailing comma
}1' file                 # output

Output:

a,b,c,d,e,f,g,h,i,j

Upvotes: 2

Jetchisel

Reputation: 7831

If ed is available/acceptable.

The script.ed

%s/[^a-z]/ /g
%s/[[:blank:]]\{1,\}/,/g
g/./;j\
s/,$//
,p
Q

Now run

ed -s file.txt < script.ed

Upvotes: 1

Ed Morton

Reputation: 204381

Using any POSIX awk and assuming you want any non-alphabetic character to act as a field separator:

$ awk -F '[^[:alpha:]]+' -v OFS=',' '{printf "%s", p; $1=$1; p=$0} END{sub(OFS"$","",p); print p}' file
a,b,c,d,e,f,g,h,i,j

If you really do just want to use the specific set of characters in your question as the field separators then just change [^[:alpha:]]+ to [!;^}8"-]+

Upvotes: 2

The fourth bird

Reputation: 163527

Using gnu-sed replace 1 or more chars other than a-z with a comma. Then remove all leading and trailing comma's

sed -Ez 's/[^a-z]+/,/g; s/^,+|,+$//' file

Output

a,b,c,d,e,f,g,h,i,j

Upvotes: 0

knittl

Reputation: 265687

If you only want to replace non-letter characters with commas and squeeze repeated commas, tr is your friend:

tr -sc '[:alpha:]' ','

This will leave a trailing comma though. You could use sed to remove/replace it:

tr -sc '[:alpha:]' ',' | sed 's/,$/\n/'

Another possibility is to split each "item" into its own line (with tr or grep -o), then use paste to combine the lines again:

tr -sc '[:alpha:]' '\n' | paste -sd,

Upvotes: 2

Daweo

Reputation: 36700

I would harness GNU AWK for this task following way, let file.txt content be

a,b;c^d"e} f;g,h!;i8j-

then

awk 'BEGIN{FPAT="[a-z]";OFS=","}{$1=$1;print}' file.txt

gives output

a,b,c,d,e,f,g,h,i,j

Explanation: I inform GNU AWK that field is single lowercase ASCII letter using FPAT, and output field separator (OFS) is ,, then for each line I do $1=$1 to trigger line rebuild and print line.

(tested in GNU Awk 5.0.1)

Upvotes: 2

Create a parser script with a delimiter

Answers (9)

Related Questions