Reputation: 1659
I've CSV file
author,host,authority,contents
_angelsuman,http://twitter.com/_angelsuman,5,green tea piyo :( #kicktraileron6thjune
_angelsuman,http://twitter.com/_angelsuman,5,rt @121training fat burning foods: grapefruit watermelon berries hot peppers celery greek yogurt eggs fish green tea coffee water oatmeal.
_angelsuman,http://twitter.com/_angelsuman,5,rt @121training fat burning foods: â´ grapefruit â´ watermelon â´ berries â´ hot peppers â´ celery â´ greek yogurt â´ eggs â´ fish â´ green tea â´ oatmeal
anukshp,http://twitter.com/anukshp,4,rt @_angelsuman dear green tea u suck..:/ but i need to sip uh for myh rsn :( zindagi ka kdwa such :/ :(
I want to identify count of occurrences of first column :"author" in fourth column "contents"
Ex: finding "_angelsuman" in contents.
Kindly suggest; how can i achieve same?
Upvotes: 0
Views: 874
Reputation: 166
You can do it as follows (assume that there is no comma in values as you said).
one-line:
awk -F, 'NR>1 {author[$1]=0; content[NR]=$4} END {for (a in author) {for (c in content) {count[a]+=gsub(a,"",content[c])} print a, count[a]}}' file
expanded:
awk -F, ' NR>1 { author[$1]=0; content[NR]=$4 } END { for (a in author) { for (c in content) { count[a] += gsub(a,"",content[c]) } print a, count[a] } }' file
How it works
read file with a comma separator -F,
and skip the first line NR>1
awk -F, 'NR>1
store the first column in array author
as the key - so every unique value will be stored once. Store the content in array content
with the key equal of line number NR
- this results of storing content from every line.
{ author[$1]=0; content[NR]=$4 }
at the end iterate by every unique author for (a in author)
and foreach author iterate by content for (c in content)
and increase number of occurences of author in content for specific author count[a]+=gsub(a,"",content[c])
.
If it is counted for specific author
, then print results print a, count[a]
.
END { for (a in author) { for (c in content) { count[a]+=gsub(a,"",content[c]) } print a, count[a] } }' file
Output
_angelsuman 1
anukshp 0
Upvotes: 1
Reputation: 185053
Using perl :
use Text::CSV;
my $col = 4; // 4th column
my $count = 0;
my @rows;
my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attribute.
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<:encoding(utf8)", "/tmp/test.csv" or die "test.csv: $!";
while ( my $row = $csv->getline( $fh ) ) {
if ($row->[$col -1] eq 'author') {
$count++;
}
}
$csv->eof or $csv->error_diag();
close $fh;
print "There's $count occurences of 'author'\n";
There's 1 occurences of 'author'
This is a proper parsing with a perl module.
Replace /tmp/test.csv
by your own file
Upvotes: 1