KHAN irfan
KHAN irfan

Reputation: 263

List processing, convert list to apostrophe and comma separated records, surrounded by brackets

I have a list in a file named Target_id_convert.txt

70S ribosome
ALK tyrosine kinase receptor
ATP
ATP synthase

Desired output

('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase')

I have written this code

sed -e "s/'/'\\\\''/g;s/\(.*\)/'\1'/" Target_id_convert.txt  > Target_id_convert1.txt
tr '\n' ',' < Target_id_convert1.txt > Target_id_convert_output.txt

I then have to manually edit the file and add () in the Target_id_convert_output.txt file, Kindly let me know how to do it efficiently and all in one go, as It is all supposed to be automated.

Upvotes: 2

Views: 123

Answers (8)

James Brown
James Brown

Reputation: 37394

In awk:

$ awk 'BEGIN{q="\047";RS="";FS="\n";OFS=q","q}{$0="("q $0 "\)"q;$1=$1}1' file

Output for single list file:

('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase')

Explained:

awk '
BEGIN {
    q="\047"             # define q to - well, \047
    RS=""                # see below (*
    FS="\n"              # newline is input field separator 
    OFS=q","q            # output field separator to ,
}
{
    $0="(" q $0 "\)" q   # surround record with single quotes
    $1=$1                # rebuild the record
} 1' file                # print

*) From the GNU awk documentation: By a special dispensation, an empty string as the value of RS indicates that records are separated by one or more blank lines. When RS is set to the empty string, each record always ends at the first blank line encountered. The next record doesn’t start until the first nonblank line that follows. This allow empty-line separated lists to be processed. For example, using @Thor's sample data, output would be:

('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase)'
('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase)'

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203229

Just set your field and record separators, recompile the record and print:

$ awk -v RS= -v s="('" -v ORS="')\n" -F'\n' -v OFS="','" '{$1=s$1}1' file
('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase')

Upvotes: 1

Thor
Thor

Reputation: 47099

Assuming your records are double new-line separated, I would go with a sed/awk combo:

<file sed "/[^[:blank:]]/ s/.*/'&'/g" |
awk '{ $1=$1; print "(" $0 ")" }' RS= FS='\n' OFS=,

If the input is:

70S ribosome
ALK tyrosine kinase receptor
ATP
ATP synthase

70S ribosome
ALK tyrosine kinase receptor
ATP
ATP synthase

Output is:

('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase')
('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase')

Upvotes: 3

mklement0
mklement0

Reputation: 437208

To offer an alternative that uses trl, a utility of mine for transforming text between single- and multi-line forms:

$ trl -S, -D\' -W'()'  <<<$'70S ribosome\nALK tyrosine kinase receptor\nATP\nATP synthase'
('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase')
  • Since the input is multi-line, the default output format is single-line.
  • -S, sets the output separator to , (what to place between items)
  • -D\' sets the output item delimiter to ' (what to enclose each item in)
  • -W'()' wraps (encloses) the resulting output line in ( and ).

Installation of trl from the npm registry (Linux and macOS)

Note: Even if you don't use Node.js, its package manager, npm, works across platforms and is easy to install; try
curl -L https://git.io/n-install | bash

With Node.js installed, install as follows:

[sudo] npm install trl -g

Note:

  • Whether you need sudo depends on how you installed Node.js and whether you've changed permissions later; if you get an EACCES error, try again with sudo.
  • The -g ensures global installation and is needed to put trl in your system's $PATH.

Manual installation (any Unix platform with bash)

  • Download this bash script as trl.
  • Make it executable with chmod +x trl.
  • Move it or symlink it to a folder in your $PATH, such as /usr/local/bin (OSX) or /usr/bin (Linux).

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133458

try:

awk -v s1="'" -v s2="'," -v s3="(" -v s4=")" 'NR==1{printf("%s",s3)} last{printf("%s",s1 last s2)} {last=$0} END{printf("%s\n",last s1 s4)}'   Input_file

I am defining the variables like s1, s2,s3 and s4 with their values. Then I am printing ( on very first line and then taking line's values into variable named last and printing the lines values with value', in END section of code printing the line's value with ') too. I am considering your Input_file is having same values as shown sample Input_file.

Upvotes: 1

Kent
Kent

Reputation: 195039

This awk one-liner should do what you want:

awk -v q="'" '{$0=q $0 q;printf "%s%s", (NR==1?"(":","),$0}END{print ")"}' file

I declared a var q to have single quote ('), to avoid many escaping.

Upvotes: 5

VIPIN KUMAR
VIPIN KUMAR

Reputation: 3137

Try this -

$ cat f
70S ribosome
ALK tyrosine kinase receptor
ATP
ATP synthase
$ awk -v line=$(wc -l < f) -v ORS="" 'BEGIN{printf "("} {if(NR < line) {print a$0b}} END {print a$0a")\n"}' b="'," a="'" f
('70S ribosome','ALK tyrosine kinase receptor','ATP','ATP synthase')

Upvotes: 1

slitvinov
slitvinov

Reputation: 5768

$ cat f.awk
BEGIN {
    sep = ""
    b = "'"
}

{
    ans = ans sep b $0 b
    sep = ","
}

END { print "(" ans ")" }

Usage:

awk -f f.awk file 

Upvotes: 2

Related Questions