Manus
Manus

Reputation: 879

formatting using sed in unix

My input file is in the following format:

5470,
1875566222,
"Antigua"
6,
1588552226,
"Barbados
12,
1488899666,
"Nicaragua"

This pattern continues for a thousand records.

Every 3 lines is actually one record. The 2nd value in each record is an ID. And there are only theses three ID's throughout the file.

My objectives are:

a) To format this file so that every record is in one line (i.e.)

5470,1875566222,Antigua
6,1588552226,Barbados
12,1488899666,Nicaragua

b) As you may have noticed in the above output, I also need the double quotes on the country names removed.

c) I would like to sort this file in descending order based on the value of the first field of each record

d) Write each record into a seperate file if it has an ID. SO I am looking at 3 files, each with a set of records having the same ID.

This may be a lot to ask of a UNIX script. But I would be very thankful if at least a part of this is achievable through UNIX shell scripting.

Thank you in advance for your time and help.

Upvotes: 0

Views: 260

Answers (5)

anubhava
anubhava

Reputation: 785038

It is easier using awk here:

> awk 'NR%3==1{a=$0} NR%3==2{b=$0} NR%3==0{gsub(/"/, ""); print a b $0}' file
5470,1875566222,Antigua
5,1488899666,United Kingdom
6,1588552226,Barbados
12,1488899666,Nicaragua
15,1488899666,United States

EDIT: To get this output in different files:

awk 'NR%3==1{a=$0} NR%3==2{b=$0} NR%3==0{gsub(/"/, ""); p=a; sub(/,$/, ".txt", p); print a b $0 > p}' fil

Sorting each file:

mkdir _tmp
for i in [0-9]*.txt; do sort -nk1,1r "$i" > _tmp/$i; done

Upvotes: 3

dogbane
dogbane

Reputation: 274542

This should take care of parts a, b and c.

$ paste -d "" - - - < file | tr -d '"' | sort -t, -k1 -nr
5470,1875566222,Antigua
12,1488899666,Nicaragua
6,1588552226,Barbados

Sure, awk will be faster but IMO this solution is much more readable.


For part d, loop over the lines and write them out:

paste -d "" - - - < file | tr -d '"' | sort -t, -k1 -nr | while IFS=, read -r a b c 
do
    echo "$a,$b,$c" >> "$b".out
done

Upvotes: 2

fedorqui
fedorqui

Reputation: 289545

This is pretty similar to the other answers. I had it done first moment but I post it now because I like answers to have explanations:

awk 'BEGIN{FS="[\",]" ;OFS=","}
     !(NR%3) {country=$2; print id, num, country}
     NR%3==1 {id=$1}
     NR%3==2 {num=$1}' file
   | sort -t"," -k1,1 -nr

Explanation

  • BEGIN{FS="[\",]" ;OFS=","} sets field separator as , or ". Output field separator is set to comma ,.
  • Now it works with NR, which stands for number of record. In this case, number of line.
  • !(NR%3) {country=$2; print id, num, country} if the line is multiple of 3 (that is NR/3 has modulus 0), then catch the value country and print to whole line.
  • NR%3==1 {id=$1} in 3k+1 lines, catch the id.
  • NR%3==2 {num=$1} in 3k+2 lines, catch the num.
  • sort -t"," -k1,1 -nr sort numerically the output, reversing it, based on the first column (and just first) and using comma , as column separator.

Test

$ awk 'BEGIN{FS="[\",]" ;OFS=","} !(NR % 3) {print id, num, $2} NR%3==1 {id=$1} NR%3==2 {num=$1}' file | sort -t"," -k1,1 -nr
5470,1875566222,Antigua
12,1488899666,Nicaragua
6,1588552226,Barbados

In case you want to create a file every time, do pipe after sort like this: awk -F, '{print > $3".dat"}'

$ awk 'BEGIN{FS="[\",]" ;OFS=","} !(NR % 3) {print id, num, $2} NR%3==1 {id=$1} NR%3==2 {num=$1}' file | sort -t"," -k1,1 -nr | awk -F, '{print > $3".dat"}'

For a sample file like this:

5470,
1875566222,
"Antigua"
6,
1588552226,
"Barbados
12,
1488899666,
"Nicaragua"
18,
148,
"Nicaragua"

It returns

$ cat  Nicaragua.dat
18,148,Nicaragua
12,1488899666,Nicaragua

Upvotes: 1

Cole Tierney
Cole Tierney

Reputation: 10314

I agree awk is best for this:

awk -F'\"|,' '/[0-9]+/{printf "%s,", $1} /[a-zA-Z]+/{print $2}'

Upvotes: 0

Jotne
Jotne

Reputation: 41456

Using awk

awk '{gsub(/"/,x);printf "%s"(NR%3==0?RS:""),$1}' file
5470,1875566222,Antigua
6,1588552226,Barbados
12,1488899666,Nicaragua

If you like to redirect output to multiple files based on ID:

awk 'NR%3==2 {f=$0+0} {gsub(/"/,x);printf "%s"(NR%3==0?RS:""),$1 > f".txt"}'

Upvotes: 2

Related Questions