seoul
seoul

Reputation: 864

Regular expression problem in C#

I have these strings as a response from a FTP server:

  1. 02-17-11 01:39PM <DIR> dec

  2. 04-06-11 11:17AM <DIR> Feb 2011

  3. 05-10-11 07:09PM 87588 output.xlsx

  4. 06-10-11 02:52PM 3462 output.xlsx

where the pattern is: [datetime] [length or <dir>] [filename]


Edit: my code was- @"^\d{2}-\d{2}-\d{2}(\s)+(<DIR>|(\d)+)+(\s)+(.*)+"

I need to parse these strings in this object:

class Files{

Datetime modifiedTime,
bool ifTrueThenFile,
string name

}

Please note that, filename may have spaces.

I am not good at regex matching, can you help?

Upvotes: 2

Views: 366

Answers (8)

ShaneBlake
ShaneBlake

Reputation: 11096

A lot of good answers, but I like regex puzzles, so I thought I'd contribute a slightly different version...

^([\d- :]{14}[A|P]M)\s+(<DIR>|\d+)\s(.+)$

For help in testing, I always use this site : http://www.myregextester.com/index.php

Upvotes: 1

BrunoLM
BrunoLM

Reputation: 100322

Regex method

One approach is using this regex

@"(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM)) (<DIR>|\d+) (.+)";

I am capturing groups, so

// Group 1 - Matches the DateTime
(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM))

Notice the syntax (?:xx), it means that the content here will not be caught in a group, we need to match PM or AM but this group alone doesn't matter.

Next I match the file size or <DIR> with

// Group 2 - Matches the file size or <DIR>
(<DIR>|\d+)

Catching the result in a group.

The last part matches directory names or file names

// Group 3 - Matches the dir/file name
(.+)

Now that we captured all groups we can parse the values:

DateTime.Parse(g[1].Value); // be careful with current culture
                            // a different culture may not work

To check if the captured entry is a file or not you can just check if it is <DIR> or a number.

IsFile = g[2].Value != "<DIR>"; // it is a file if it is not <DIR>

And the name is just what is left

Name = g[3].Value; // returns a string

Then you can use the groups to build the object, an example:

public class Files
{
    public DateTime ModifiedTime { get; set; }
    public bool IsFile { get; set; }
    public string Name { get; set; }

    public Files(GroupCollection g)
    {
        ModifiedTime = DateTime.Parse(g[1].Value);
        IsFile = g[2].Value != "<DIR>";
        Name = g[3].Value;
    }
}

static void Main(string[] args)
{
    var p = @"(\d{2}-\d{2}-\d{2} \d{2}:\d{2}(?:PM|AM)) (<DIR>|\d+) (.+)";
    var regex = new Regex(p, RegexOptions.IgnoreCase);

    var m1 = regex.Match("02-17-11 01:39PM <DIR> dec");
    var m2 = regex.Match("05-10-11 07:09PM 87588 output.xlsx");

    // DateTime: 02-17-11 01:39PM
    // IsFile  : false
    // Name    : dec
    var file1 = new Files(m1.Groups);

    // DateTime: 05-10-11 07:09PM
    // IsFile  : true
    // Name    : output.xlsx
    var file2 = new Files(m2.Groups);
}

Further reading


String manipulation method

Another way to achieve this is to split the string which can be much faster:

public class Files
{
    public DateTime ModifiedTime { get; set; }
    public bool IsFile { get; set; }
    public string Name { get; set; }

    public Files(string line)
    {
        // Gets the date part and parse to DateTime
        ModifiedTime = DateTime.Parse(line.Substring(0, 16));

        // Gets the file information part and split
        // in two parts
        var fileBlock = line.Substring(17).Split(new char[] { ' ' }, 2);

        // first part tells if it is a file
        IsFile = fileBlock[0] != "<DIR>";

        // second part tells the name
        Name = fileBlock[1];
    }
}

static void Main(string[] args)
{
    // DateTime: 02-17-11 01:39PM
    // IsFile  : false
    // Name    : dec
    var file3 = new Files("02-17-11 01:39PM <DIR> dec");

    // DateTime: 05-10-11 07:09PM
    // IsFile  : true
    // Name    : out put.xlsx
    var file4 = new Files("05-10-11 07:09PM 87588 out put.xlsx");
}

Further reading

Upvotes: 3

Amir Ismail
Amir Ismail

Reputation: 3883

try this

String regexTemp= @"(<Date>(\d\d-\d\d-\d\d\s*\d\d:\d\dA|PM)\s*(<LengthOrDir>\w*DIR\w*|\d+)\s*(<Name>.*)";

Match mExprStatic = Regex.Match(regexTemp, RegexOptions.IgnoreCase | RegexOptions.Singleline);
if (mExprStatic.Success || !string.IsNullOrEmpty(mExprStatic.Value))
{
  DateTime _date = DateTime.Parse(mExprStatic.Groups["lang"].Value);
  String lengthOrDir = mExprStatic.Groups["LengthOrDir"].Value;
  String Name = mExprStatic.Groups["Name"].Value;
}

Upvotes: 1

Fadrian Sudaman
Fadrian Sudaman

Reputation: 6465

You should be able to implement this with simple string.split, if statement and parse/parseexact method to convert the value. If it is a file then just concatenated the remaining string token so you can reconstruct filename with space

Upvotes: 0

user604613
user604613

Reputation:

I like the regex Leif posted.

However, i'll give you another solution which people will probably hate: fast and dirty solution which i am coming up with just as i am typing:

string[] allParts = inputText.Split(" ")
allParts[0-1] = parse your DateTime
allParts[2] = <DIR> or Size
allParts[3-n] = string.Join(" ",...) your filename 

There are some checks missing there, but you get the idea. Is it nice code? Probably not. Will it work? With the right amount of time, surely. Is it more readable? I tend to to think "yes", but others might disagree.

Upvotes: 0

ub1k
ub1k

Reputation: 1674

You don't need to use regex here. Why don't you split the string by spaces with a number_of_elements limit:

var split = yourEntryString.Split(new string []{" "}, 4, 
    StringSplitOptions.RemoveEmptyEntries);
var date = string.Join(" ", new string[] {split[0], split[1]});
var length = split[2];
var filename = split[3];

this is of course assuming that the pattern is correct and none of the entries would be empty.

Upvotes: 0

Leif
Leif

Reputation: 2160

Here you go:

(\d{2})-(\d{2})-(\d{2}) (\d{2}):(\d{2})([AP]M) (<DIR>|\d+) (.+)

I used a lot of sub expressions, so it would catch all relevant parts like year, hour, minute etc. Maybe you dont need them all, just remove the brackets in case.

Upvotes: 2

Mat
Mat

Reputation: 206669

You can try with something like:

^(\d\d-\d\d-\d\d)\s+(\d\d:\d\d[AP]M)\s+(\S+)\s+(.*)$

The first capture group will contain the date, the second the time, the third the size (or <DIR>, and the last everything else (which will be the filename).

(Note that this is probably not portable, the time format is locale dependent.)

Upvotes: 2

Related Questions