Reputation: 3563
Trying to get a collection of distinct Identifiers in a group of files. What am I doing wrong with this Lambda Query:
var enumDir = Directory.GetFiles(folder);
var distinctCode = enumDir.Select(s => Path.GetFileName(s).Substring(8, 4))
.GroupBy(s => s.ToString());
Thanks in advance ...
@empi suggestion. I expect to get a list of the distinct 4 letter substring from the file name, what I get is nothing or a first I had put the Path.Get.... part in the group by aswell and I got an index out of range exception.
@Oskar Kjellin suggestion I should mention every filename has a set length of 45 characters
var enumDir = Directory.GetFiles(folder).Where(a => Path.GetFileName(a).Length > 12);
var distinctCode = enumDir.Select(s => Path.GetFileName(s).Substring(8, 4)).Distinct();
Really a combination of both suggestions I don't know who to mark answer for really.
Upvotes: 0
Views: 2431
Reputation: 134491
I think you can do this a lot better using regular expressions to do the validation. The kind of checks that you are trying to do within your query are too complicated to do in a single query. It is possible that there could be other files in that directory that doesn't follow your pattern that you're not expecting and could mess everything up. That complexity could be captured by the regular expression instead. If your filenames follow a certain pattern, a simple regex match could handle this for you.
I have no idea how your file names look like so I'll use recorded TV from Windows Media Center as an example. All WMC filenames have a certain pattern to it:
[title]_[station]_[year]_[month]_[day]_[hour]_[minute]_[second].wtv
Then to group all the videos by title, you could do this:
var dir = @"C:\Users\Public\Recorded TV";
var wmcFileRe = new Regex(@"
^
(?<title>.+)
_
(?<station>.+)
_
(?<date>\d{4}_\d{2}_\d{2})
_
(?<time>\d{2}_\d{2}_\d{2})
\.wtv
$
", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);
var query =
from filePath in Directory.EnumerateFiles(dir)
let fileName = Path.GetFileName(filePath)
let match = wmcFileRe.Match(fileName)
where match.Success
orderby match.Groups["title"].Value,
match.Groups["date"].Value descending,
match.Groups["time"].Value descending
group filePath by match.Groups["title"].Value;
Yields something like this:
Also, use Directory.EnumerateFiles()
instead of Directory.GetFiles()
so you're not creating that array of results up front, that array is not needed anywhere else.
Upvotes: 1
Reputation: 10221
I suspect what you looking for is to get the actual file names, but group them by the substring.
var result = Directory.GetFiles(folder)
.Select(s => Path.GetFileName(s))
.Where(s => s.Length > 12)
.GroupBy(s => s.Substring(8, 4));
Now in result
you have the group objects with the Key
being your substring and if you enumerate them you get actual file names that matched that key.
Upvotes: 1
Reputation: 15901
enumDir.Select(s => Path.GetFileName(s).Substring(8, 4))
- this code should return IEnumerable<string>
- check if this collection is ok. If it is ok then just use Distinct()
.
Upvotes: 1
Reputation: 21900
You should always check for length before calling substring to avoid the exception...
enumDir = Directory.GetFiles(folder);
distinctCode = enumDir.Select(s => Path.GetFileName(s))
.Select( s=> s.Length >= 12 ? s.Substring(8, 4) : s).GroupBy(s => s);
You can never really control what files are in the folder. For instance windows can create thumbs.db which is a cache of thumbnails of images or other temp files.
Perhaps you want to filter out only those with your fixed length:
enumDir = Directory.GetFiles(folder);
distinctCode = enumDir.Select(s => Path.GetFileName(s)).Where(s=>s.Length == 45)
.Select( s=> s.Substring(8, 4)).GroupBy(s => s);
Upvotes: 1