Reputation: 879
My input file is in the following format:
5470,
1875566222,
"Antigua"
6,
1588552226,
"Barbados
12,
1488899666,
"Nicaragua"
This pattern continues for a thousand records.
Every 3 lines is actually one record. The 2nd value in each record is an ID. And there are only theses three ID's throughout the file.
My objectives are:
a) To format this file so that every record is in one line (i.e.)
5470,1875566222,Antigua
6,1588552226,Barbados
12,1488899666,Nicaragua
b) As you may have noticed in the above output, I also need the double quotes on the country names removed.
c) I would like to sort this file in descending order based on the value of the first field of each record
d) Write each record into a seperate file if it has an ID. SO I am looking at 3 files, each with a set of records having the same ID.
This may be a lot to ask of a UNIX script. But I would be very thankful if at least a part of this is achievable through UNIX shell scripting.
Thank you in advance for your time and help.
Upvotes: 0
Views: 260
Reputation: 785038
It is easier using awk here:
> awk 'NR%3==1{a=$0} NR%3==2{b=$0} NR%3==0{gsub(/"/, ""); print a b $0}' file
5470,1875566222,Antigua
5,1488899666,United Kingdom
6,1588552226,Barbados
12,1488899666,Nicaragua
15,1488899666,United States
EDIT: To get this output in different files:
awk 'NR%3==1{a=$0} NR%3==2{b=$0} NR%3==0{gsub(/"/, ""); p=a; sub(/,$/, ".txt", p); print a b $0 > p}' fil
Sorting each file:
mkdir _tmp
for i in [0-9]*.txt; do sort -nk1,1r "$i" > _tmp/$i; done
Upvotes: 3
Reputation: 274542
This should take care of parts a, b and c.
$ paste -d "" - - - < file | tr -d '"' | sort -t, -k1 -nr
5470,1875566222,Antigua
12,1488899666,Nicaragua
6,1588552226,Barbados
Sure, awk
will be faster but IMO this solution is much more readable.
For part d, loop over the lines and write them out:
paste -d "" - - - < file | tr -d '"' | sort -t, -k1 -nr | while IFS=, read -r a b c
do
echo "$a,$b,$c" >> "$b".out
done
Upvotes: 2
Reputation: 289545
This is pretty similar to the other answers. I had it done first moment but I post it now because I like answers to have explanations:
awk 'BEGIN{FS="[\",]" ;OFS=","}
!(NR%3) {country=$2; print id, num, country}
NR%3==1 {id=$1}
NR%3==2 {num=$1}' file
| sort -t"," -k1,1 -nr
BEGIN{FS="[\",]" ;OFS=","}
sets field separator as ,
or "
. Output field separator is set to comma ,
.NR
, which stands for number of record. In this case, number of line.!(NR%3) {country=$2; print id, num, country}
if the line is multiple of 3 (that is NR/3 has modulus 0), then catch the value country
and print to whole line.NR%3==1 {id=$1}
in 3k+1 lines, catch the id.NR%3==2 {num=$1}
in 3k+2 lines, catch the num.sort -t"," -k1,1 -nr
sort numerically the output, reversing it, based on the first column (and just first) and using comma ,
as column separator.$ awk 'BEGIN{FS="[\",]" ;OFS=","} !(NR % 3) {print id, num, $2} NR%3==1 {id=$1} NR%3==2 {num=$1}' file | sort -t"," -k1,1 -nr
5470,1875566222,Antigua
12,1488899666,Nicaragua
6,1588552226,Barbados
In case you want to create a file every time, do pipe after sort
like this: awk -F, '{print > $3".dat"}'
$ awk 'BEGIN{FS="[\",]" ;OFS=","} !(NR % 3) {print id, num, $2} NR%3==1 {id=$1} NR%3==2 {num=$1}' file | sort -t"," -k1,1 -nr | awk -F, '{print > $3".dat"}'
For a sample file like this:
5470,
1875566222,
"Antigua"
6,
1588552226,
"Barbados
12,
1488899666,
"Nicaragua"
18,
148,
"Nicaragua"
It returns
$ cat Nicaragua.dat
18,148,Nicaragua
12,1488899666,Nicaragua
Upvotes: 1
Reputation: 10314
I agree awk is best for this:
awk -F'\"|,' '/[0-9]+/{printf "%s,", $1} /[a-zA-Z]+/{print $2}'
Upvotes: 0
Reputation: 41456
Using awk
awk '{gsub(/"/,x);printf "%s"(NR%3==0?RS:""),$1}' file
5470,1875566222,Antigua
6,1588552226,Barbados
12,1488899666,Nicaragua
If you like to redirect output to multiple files based on ID:
awk 'NR%3==2 {f=$0+0} {gsub(/"/,x);printf "%s"(NR%3==0?RS:""),$1 > f".txt"}'
Upvotes: 2