saadurr
saadurr

Reputation: 77

Bash: Reading a CSV file and sorting column based on a condition

I am trying read a CSV text file and print all entries of one column (sorted), based on a condition.

The input sample is as below:

Computer ID,User ID,M
Computer1,User3,5
Computer2,User5,8
computer3,User4,9
computer4,User10,3
computer5,User9,0
computer6,User1,11

The user-ID (2nd column) needs to be printed if the hours (third column) is greater than zero. However, the printed data should be sorted based on the user-id.

I have written the following script:

while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
    if [ $col3 -gt 0 ] 
    then
        echo "$col2" > login.txt
    fi
done < <(tail -n+2 user-list.txt)

The output of this script is:

User3
User5
User4
User10
User1

I am expecting the following output:

User1
User3
User4
User5
User10

Any help would be appreciated. TIA

Upvotes: 1

Views: 1094

Answers (4)

James Brown
James Brown

Reputation: 37464

Using awk for condition handling and sort for ordering:

$ awk -F, '                       # comma delimiter
FNR>1 && $3 {                     # skip header and accept only non-zero hours
    a[$2]++                       # count instances for duplicates
}
END {
    for(i in a)                   # all stored usernames
        for(j=1;j<=a[i];j++)      # remove this if there are no duplicates
            print i | "sort -V"   # send output to sort -V
}' file

Output:

User1
User3
User4
User5
User10

If there are no duplicated usernames, you can replace a[$2]++ with just a[$2] and remove the latter for. Also, no real need for sort to be inside awk program, you could just as well pipe data from awk to sort, like:

$ awk -F, 'FNR>1&&$3{a[$2]++}END{for(i in a)print i}' file | sort -V 

FNR>1 && $3 skips the header and processes records where hours column is not null. If your data has records with negative hours and you only want positive hours, change it to FNR>1 && $3>0.

Or you could use grep with PCRE andsort:

$ grep -Po "(?<=,).*(?=,[1-9])" file | sort -V

Upvotes: 1

Timur Shtatland
Timur Shtatland

Reputation: 12425

Sort ASCIIbetically:

tail -n +2 user-list.txt | perl -F',' -lane 'print if $F[2] > 0;' | sort -t, -k2,2 
computer6,User1,11
computer4,User10,3
Computer1,User3,5
computer3,User4,9
Computer2,User5,8

Or sort numerically by the user number:

tail -n +2 user-list.txt | perl -F',' -lane 'print if $F[2] > 0;' | sort -t, -k2,2V
computer6,User1,11
Computer1,User3,5
computer3,User4,9
Computer2,User5,8
computer4,User10,3

Upvotes: 1

Raman Sailopal
Raman Sailopal

Reputation: 12887

awk -F, 'NR == 1 { next } $3 > 0 { match($2,/[[:digit:]]+/);map[$2]=substr($2,RSTART) } END { PROCINFO["sorted_in"]="@val_num_asc";for (i in map) { print i } }' user-list.txt > login.txt

Set the field delimiter to commas with -F, Ignore the header with NR == 1 { next } Set the index of an array (map) to the user when the 3rd delimited field is greater than 0. The value is set the number part of the User field (found with the match function) In the end block, set the sort order to value, number, ascending and loop through the map array created.

Upvotes: 2

tink
tink

Reputation: 15239

The problem with your script (and I presume with the "sorting isn't working") is the place where you redirect (and may have tried to sort) - the following variant of your own script does the job:

#!/bin/bash
while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
    if [ $col3 -gt 0 ] 
    then
        echo "$col2"
    fi
done < <(tail -n+2 user-list.txt) | sort > login.txt

Edit 1: Match new requirement

Sure we can fix the sorting; sort -k1.5,1.7n > login.txt

Of course, that, too, will only work if your user IDs are all 4 alphas and n digits ...

Upvotes: 1

Related Questions