Reputation: 6874
I'm fairly new to ruby but this is testing me
I want to count all the lines in any file that ends in bowtie.txt in a folder
The lines have to start with a number of varying length followed by a '+' or a '-' (with or without whitespace inbetween. Sometimes the lines are wrapped but I don't know if this matters).
I want to then create a hash that stores the filename with the count associated with it.
I've got as far I think as looping through the directory to select the files out and then counting the number of lines in that file but how do I then create the hash and return it?
The file data looks like:
0 + chr12 129402816 ACACAGGGAGGGGAATAACACACACTGGGACCTGTCAGGAGAGGGTAGGGCTGGGGGCATCAGGAGAGCATCAGGAAAAATAGCTAATGCATGCTGGGCT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0
2 - chr5 93625939 TCAACCTGTCATCTACATTAGGTATTTCTCCTAATGCTATCCCTCCCCTAGCCCCCCACCACCCAACAGACCCTGGTGTGTGATGTTCCCCTCCCTGTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0 5:T>C
5 + chr3 155023119 ACACAGGGAGGGGAACATCACACACCGGGGCCTGTAGTGGGGGTGAGGGGCAAGAGGAGGAATAGCATTAGGAGAAATACCTAATGTAGATGACCGGTTG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0
7 + chr2 22818055 ACACAGGGAGGGGAAAAACACACACTGGGGCTTCTCAGGGGTGGTGGGGGGAGAGCATCAGGATAAATAGCTAATGCATGCAGGGCTTAATACCTAGGTG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0
8 + chr3 131206106 ACACAGGGAGGGGAACATCACACACCAGGCCCTGTCAGCGGTGAGGGGCTGGGGGAGGGATAGCATTAAGAGAAATACCTAATATAAATGACGAGTTGAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0 8:C>A
10 + chrX 108455592 ACACAGGGAGGGGAACATCACACACCAGGGCCTGTCGGGCAGTGGGGGGGCAAAGGGAGGGATTAAGTCATACACCCAATGCATGTGGGGCTTAAAACCC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0 7:A>G
11 - chr2 31936302 ACCCATTAACTCGTCATTTACATTAGGTATATCTCCTAATGCTATCCCTCCCCCCACCCCACAACAGGCCCCCCGGTGTGTGATGTTCCCCTCCCTGTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0 7:T>C
This is what I am trying to get at the end
blablabla.bowtie.txt : 27998
blablafsfds.bowtie.txt : 25987
etc
This is my attempt at the code:
Dir[File.join('/Volumes/SeagateBackupPlusDriv/SequencingRawFiles/TumourOesophagealOCCAMS/SequencingScripts/3finalcounts', '*.bowtie.txt')].each |file| do
puts File.open(file) { |f| f.grep(/^[0-9]*.\+|\-/).count }
end
Upvotes: 0
Views: 77
Reputation: 121000
Untested, since I have no input files, but likely working:
# `Dir[]` expects it’s own format
# ⇓ will inject results into hash
Dir['/Volumes/.../*.bowtie.txt'].inject({}) do |memo, file|
memo[file] = File.readlines(file).select do |line|
line =~ /^[0-9]+\s*(\+|\-)/ # only those, matching
end.count
memo
end
Additional references: IO#readlines
, Enumerable#select
, Enumerable#inject
.
Upvotes: 3