Reputation: 2065
I have a file in the below format:
Total:89.3
User: user1
Count:3
Sum:80
departmentId: dept1
Amount by departmentId: 20
departmentId: dept1
Amount by departmentId: 35
departmentId: dept2
Amount by departmentId: 25
User: user2
Count:3
Sum:7.199999999999999
departmentId: dept1
Amount by departmentId: 2.4
departmentId: dept2
Amount by departmentId: 2.4
departmentId: dept3
Amount by departmentId: 2.4
User: user3
Count:1
Sum:0.2
departmentId: dept2
Amount by departmentId: 0.2
User: user4
Count:2
Sum:2
departmentId: dept3
Amount by departmentId: 1
departmentId: dept3
Amount by departmentId: 1
The file list basically the User dues for a department. If the same user is due to a department multiple times then that need to be merged into one row. The output file needs to be in the below format. For user1, he has 2 dues for dept1 and 1 due for dept2. So in the output file the 2 dues for dept1 need to be merged into 1 count needs to be no. of unique user per department.
Format:
count_of_uique_user_dept_rows total_sum -- note** header row-->total sum and total no. of unique user dues
userId+deptId sum for that dept
Example:
7 89.3
user1dept1 55
user1dept2 25
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
user3dept2 0.2
user4dept3 2
what i have so far,
# This awk script is used to convert the input of library credit/debit's to the required Student Accounts Load format
BEGIN { FS=": *" }
{
gsub(/^ +| +$/,"")
f[$1] = $2
}
/Amount/ {
dept = f["departmentId"]
total = f["Total"]
sum[dept] += $2
amount += $2
}
$1 == "User" {
if (NR>1) {
format()
}
user = $2
}
END { format() }
function format() {
if ( length(sum) > 0 ) {
for (dept in sum) {
printf "%-9s%-12s%10.2f\n", substr(user,1,9), substr(dept,1,12), sum[dept]
}
delete sum
amount = 0
}
}
The above script gives us the data row. I am not able to figure out how to get the header row of 7 89.3 Please help.
Upvotes: 1
Views: 58
Reputation: 37404
Using GNU awk and 2d arrays:
$ awk '
$1=="User:" { # store user
u=$NF
}
$1=="departmentId:" { # store dept
d=$NF
}
$1=="Amount" {
if(a[u][d]=="") # count uniq user/depts
c++
s+=$NF # total sum
a[u][d]+=$NF # user/dept sum
}
END {
printf "%s, %.2f\n",c,s # output count and total
for(u in a)
for(d in a[u])
printf "%s %s %.2f\n",u,d,a[u][d] # output user/dept sums
}' file
Output:
7 89.40
user1 dept1 55.00
user1 dept2 25.00
user2 dept1 2.40
user2 dept2 2.40
user2 dept3 2.40
user3 dept2 0.20
user4 dept3 2.00
Upvotes: 1
Reputation: 203324
I decided not to read the file twice but just save the outputs in an array before printing. Here's how to do that:
Step 1: fix the syntax error that you'll get from some awks when they assume sum
is a scalar due to you calling length(sum)
before it's been used as an array by adding a delete sum
array operation in the BEGIN section (you could just remove the test on length(sum) as it's not doing anything useful in your code but I wanted to explain the issue and how to solve it in general).
BEGIN { FS=": *"; delete sum }
Step 2: change the format()
function to load up an array of values to be output later instead of immediately outputting those values:
function format() {
if ( length(sum) > 0 ) {
for (dept in sum) {
vals[++numVals] = sprintf("%-9s%-12s%10.2f", substr(user,1,9), substr(dept,1,12), sum[dept])
}
delete sum
amount = 0
}
}
Step 3: add a loop in the END section to actually do the prints:
END {
format()
for (valNr=1; valNr<=numVals; valNr++) {
print vals[valNr]
}
}
At this point the output you get will be exactly the same as your existing script but it sets us up to add the new functionality you need:
Step 4: save each user+dept combinations as indices of an array usrdpt[]
:
/Amount/ {
dept = f["departmentId"]
total = f["Total"]
sum[dept] += $2
usrdpt[user,dept]
amount += $2
}
Step 5: print the number of unique indices of the new usrdpt[]
array in the END section before printing the previous values:
END {
format()
print length(usrdpt)
for (valNr=1; valNr<=numVals; valNr++) {
print vals[valNr]
}
}
The result is:
$ cat tst.awk
BEGIN { FS=": *"; delete sum }
{
gsub(/^ +| +$/,"")
f[$1] = $2
}
/Amount/ {
dept = f["departmentId"]
total = f["Total"]
sum[dept] += $2
usrdpt[user,dept]
amount += $2
}
$1 == "User" {
if (NR>1) {
format()
}
user = $2
}
END {
format()
print length(usrdpt)
for (valNr=1; valNr<=numVals; valNr++) {
print vals[valNr]
}
}
function format() {
if ( length(sum) > 0 ) {
for (dept in sum) {
vals[++numVals] = sprintf("%-9s%-12s%10.2f", substr(user,1,9), substr(dept,1,12), sum[dept])
}
delete sum
amount = 0
}
}
.
$ awk -f tst.awk file
7
user1 dept1 55.00
user1 dept2 25.00
user2 dept1 2.40
user2 dept2 2.40
user2 dept3 2.40
user3 dept2 0.20
user4 dept3 2.00
I assume you can figure out how to save and later print the Total
value.
Upvotes: 2