Reputation: 13
I am new to C# and I am completely stuck on a parsing issue where I want to use recursion but my attempts to do so have gotten me nowhere.
I want to read in a file that has the following format:
root:
fileA
sub1:
fileB
fileC
sub2:
fileD
fileE
fileF
fileG
sub3:
fileH
Essentially, the lines ending the colons (:) are supposed to represent directories and the lines that do not end in colons are supposed to represent files in their parent directory as such: fileA and fileG belong in the root directory, fileB, fileC, and fileF are located within the sub1 directory, and so on (location is determined by indents/spaces).
Thus, I want to read in this file, and more complex files with a similar structure in a better way than I am currently doing it (a horrendous mess of for loops and if statements). I am using simple custom classes for the directories and files (I am not using .NET classes except for StreamReader for reading in the text file line by line)
I have done something similar in python once but for some reason I cannot wrap my head around how to do it in C#, which is silly since this is an abstract problem and the language specific implementations should not matter that much. I guess what does matter is my lack of understanding of how to best apply recursion in situations like these. Am I on the right track? I just sense that there is a much more elegant way of parsing this file into my self defined classes (retaining a tree structure in the sample text) and I think recursion is the answer. I just cannot see it.
Any help would be appreciated, even if it is less an answer and more a violent shove in the right direction. Or a gentle nudge.
Sample Code in C# trying to use Recursion (incomplete, but I hope it gets across what I am trying to do):
public void buildDirectoryFromFile(string file)
{
string line;
StreamReader data = new StreamReader(file);
int depth = 0;
while ((line = data.ReadLine()) != null)
{
depth = line.Length - line.TrimStart(' ').Length;
parseTextIntoTree(line, depth);
}
}
public void parseTextIntoTree(string line, int depth)
{
if (line.Contains(':'))
{
//Totally lost
}
else
{
//Totally lost
}
}
Depth in this case refers to the spaces/indents. The more spaces there are between the string and the left margin the 'deeper' it is in the tree.
Upvotes: 1
Views: 2178
Reputation: 14064
Try this. Since you specified the depth would be refering to the spaces/indents. Thus
TreeStruc = @"root:
fileA
sub1:
fileB
fileC
sub2:
fileD
fileE
fileF
fileG
sub3:
fileH";
TreeStruc = Regex.Replace(TreeStruc, " ", "-");
TreeStruc = Regex.Replace(TreeStruc, "---", "-");
using(StringReader reader = new StringReader(TreeStruc))
{
string line;
while((line = reader.ReadLine()) != null)
{
int cnt = line.Count(f => f == '-');
String str2Replace = new String('-', cnt);
line = Regex.Replace(line, str2Replace == "" ? "-" : str2Replace, cnt.ToString() + "~ ");
updatedTreeStruc = updatedTreeStruc + line + "\n";
}
}
It would show the result as follows:
root:
1~ fileA
1~ sub1:
2~ fileB
2~ fileC
2~ sub2:
3~ fileD
3~ fileE
2~ fileF
1~ fileG
1~ sub3:
2~ fileH
Now from here on we already know the depth that is the number in front of the string.
In the same way we can analyze whether the string is a folder or a file.
Upvotes: 0
Reputation: 14329
I do not have much experience with parsing layout-style or whitespace sensitive languages, but from what I have experienced such a task does not fall under the usual parsing solutions. You at least have to step beyond context-free languages.
The way I decided to model this example of layout rules is as a binary tree. Nodes are the line content. Descending to the left represents maintaining the same indent level whereas descending to the right represents increasing the indent level.
root
/ \
ε fileA
/ \
sub1: ε
| \___
| |
fileG fileB – ε
| |
sub3: fileC – ε
| \ |
ε fileH sub2: —+
| |
fileF fileD
|
fileE
I have modeled your source file as such a tree. You will also note that in respect to line order the tree descends to the right first, then left second.
What we have now is a way to view the source in a brace-style way which unlike layout-style can be parsed with any language parsing tool. For example, say we want to generate tokens to be consumed by a parser. This can be easily done recursively as your intuition hinted.
In terms of tree traversals this is close to a right-to-left preorder. However, we are also adding a close brace in-order.
I now follow this algorithm to tokenise the example. For simplicity I just use the names in the original source as the token names and also add {
and }
as tokens.
(recursion_depth.step_number)
(1.1) root
(1.2) root {
(2.1) root { fileA
(2.2) root { fileA {
(2.3) root { fileA { }
(3.1) root { fileA { } sub1:
(3.2) root { fileA { } sub1: {
(4.1) root { fileA { } sub1: { fileB
etc.
Finally arriving at (formatted for clarity)
root {
fileA { }
sub1: {
fileB { }
fileC { }
sub2: {
fileD { }
fileE { }
}
fileF { }
}
fileG { }
sub3: {
fileH { }
}
}
From here I hope it is clearer how you can construct an abstract syntax tree for your language. If you want to construct the tree I mention then consider keeping a stack of indentation levels. It requires a bit of thinking but I'll leave you to that.
Upvotes: 1
Reputation: 110
This code works with your posted file:
//The top-level Node
TreeNode root;
//Temp value for processing in parseTextIntoTree
TreeNode ParentNode;
//Temp value for processing in parseTextIntoTree
int actualdepth = -1;
//The depth step between two layers
int depthdifference = 3;
public void buildDirectoryFromFile(string file)
{
string line;
StreamReader data = new StreamReader(file);
int depth = 0;
while ((line = data.ReadLine()) != null)
{
depth = line.Length - line.TrimStart(' ').Length;
parseTextIntoTree(line, depth);
}
this.treeView1.Nodes.Add(root);
}
public void parseTextIntoTree(string line, int depth)
{
//At the beginning define the root node
if (root == null)
{
root = new TreeNode(line);
ParentNode = root;
actualdepth = depth;
}
else
{
//Search the parent node for the current depth
while (depth <= actualdepth)
{
//One step down
ParentNode = ParentNode.Parent;
actualdepth -= depthdifference;
}
ParentNode = ParentNode.Nodes.Add(line.Trim());
actualdepth = depth;
}
}
Upvotes: 0
Reputation: 311
Let's try this:
public void buildDirectoryFromFile(string file)
{
string line;
StreamReader data = new StreamReader(file);
List<string> lines = new List<string>();
while ((line = data.ReadLine()) != null)
{
lines.Add(line);
}
int lineProcessed = 0;
ParseTextIntoTree(lines, ref lineProcessed, 0);
}
public const int PadCount = 3; // Your padding length in text file
public void ParseTextIntoTree(List<string> lineList, ref int lineProcessed, int depth)
{
while(lineProcessed < lineList.Count)
{
string line = lineList[lineProcessed++];
int lineDepth = line.Length - line.TrimStart(' ').Length;
if(lineDepth < depth)
{
// If the current depth is lower than the desired depth
// end of directory structure is reached. Do backtrack
// and reprocess the line in upper depth
lineProcessed--;
return;
}
if(line.EndsWith(":"))
{
// Do something, perhaps create directory?
ParseTextIntoTree(lineList, ref lineProcessed, depth + PadCount);
}
else
{
// Do something, perhaps create file?
}
}
}
Upvotes: 0