Reputation: 41

line count with in the text files having multiple lines and single lines

i am using UTL_FILE utility in oracle to get the data in to csv file. here i am using the script.

so i am getting the set of text files

case:1

sample of output in the test1.csv file is

"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"

now i am counting the number of records in the test1.csv by using linux commans as

egrep -c "^\"[0-9]" test1.csv

here i am getting the record count as

2      (ACCORDING TO LINUX)

but if i calculate the number of records by using select * from test;

 COUNT(*)
----------                 (ACCORDING TO DATA BASE)
    2

case:2

sample of output in the test2.csv file is

"sno","name","p" "","","" "","","ramesh is in USA" "","",""

now i am counting the number of records in the test2.csv by using linux commans as

egrep -c "^\"[0-9]" test2.csv

here i am getting the record count as

0      (ACCORDING TO LINUX)

but if i calculate the number of records by using select * from test;

 COUNT(*)
----------                 (ACCORDING TO DATA BASE)
    2

can any body help me how to count the exact lines in case:1 and case:2 using the single command

thanks in advance.

Upvotes: 0

Answers (3)

akostadinov

Reputation: 18614

In your second example your lines do not start with " followed by a number. That's why count is 0. You can try egrep -c "^\"([0-9]|\")" to catch empty first column values. ~~But in fact it might be simpler to count all lines and remove 1 because of the header row.~~

~~e.g.~~ ~~count=$(( $(wc -l test.csv) - 1 ))~~

Upvotes: 0

Sanket Parmar

Reputation: 1577

Columns in both case is different. To make it generic I wrote a perl script which will print the rows. It generates the regex from headers and used it to calculate the rows. I assumed that first line always represents the number of columns.

#!/usr/bin/perl -w

open(FH, $ARGV[0]) or die "Failed to open file";

# Get coloms from HEADER and use it to contruct regex 
my $head = <FH>;
my @col = split(",", $head); # Colums array
my $col_cnt = scalar(@col);  # Colums count

# Read rest of the rows 
my $rows;
while(<FH>) {
$rows .= $_;
}

# Create regex based on number of coloms
# E.g for 3 coloms, regex should be 
# ".*?",".*?",".*?" 
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", @col);

# /s to treat the data as single line 
# /g for global matching
my @row_cnt = $rows =~ m/($regex)/sg; 
print "Row count:" . scalar(@row_cnt);

Just store it as row_count.pl and run it as ./row_count.pl filename

Upvotes: 1

TimP

Reputation: 210

egrep -c test1.csv doesn't have a search term to match for, so it's going to try to use test1.csv as the regular expression it tries to search for. I have no idea how you managed to get it to return 2 for your first example.

A useable egrep command that will actually produce the number of records in the files is egrep '"[[:digit:]]*"' test1.csv assuming your examples are actually accurate.

timp@helez:~/tmp$ cat test.txt
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"

timp@helez:~/tmp$ egrep -c '"[[:digit:]]*"' test.txt
2

timp@helez:~/tmp$ cat test2.txt
"sno","name"
"1","hari is in singapore"
"2","ramesh is in USA"

timp@helez:~/tmp$ egrep -c '"[[:digit:]]*"' test2.txt
2

Alternatively you might do better to add an extra value to your SELECT statement. Something like SELECT 'recmatch.,.,',sno,name FROM TABLE; instead of SELECT sno,name FROM TABLE; and then grep for recmatch.,., though that's something of a hack.

Upvotes: 0

line count with in the text files having multiple lines and single lines

Answers (3)

Related Questions