Reputation: 55
I'm trying to figure out how to parse a string in this format into a tree like data structure of arbitrary depth. and after that make random sentences.
"{{Hello,Hi,Hey} {world,earth},{Goodbye,farewell} {planet,rock,globe{.,!}}}"
where
, means or
{ means expand
} means collapse up to parent
for example, i want to get output like this:
1) hello world planet.
2) hi earth globe!
3) goodby planet.
and etc.
Upvotes: 0
Views: 385
Reputation: 143
If the question is how to parse the text. I think maybe you can use the stack to parse it.
"{{Hello,Hi,Hey} {world,earth},{Goodbye,farewell} {planet,rock,globe{.,!}}}"
Basically, you push char in the stack when you read a char is not '}'. And when you get a '}', you pop from stack many time, until you reach a '{'.
But it has more details, because you have a rule ',' for OR
.
The parsing is like do the calculation by stack. This is the way how you handle parenthesis for equation.
Upvotes: 0
Reputation: 112392
The input string must be parsed. Since it can contain nested braces, we need a recursive parser. But to begin with, we need a data model to represent the tree structure.
We can have three different types of items in this tree: text, a list representing a sequence and a list representing a choice. Let's derive three classes from this abstract base class:
abstract public class TreeItem
{
public abstract string GetRandomSentence();
}
The TextItem
class simply returns its text as "random sentence":
public class TextItem : TreeItem
{
public TextItem(string text)
{
Text = text;
}
public string Text { get; }
public override string GetRandomSentence()
{
return Text;
}
}
The sequence concatenates the text of its items:
public class SequenceItem : TreeItem
{
public SequenceItem(List<TreeItem> items)
{
Items = items;
}
public List<TreeItem> Items { get; }
public override string GetRandomSentence()
{
var sb = new StringBuilder();
foreach (var item in Items) {
sb.Append(item.GetRandomSentence());
}
return sb.ToString();
}
}
The choice item is the only one using randomness to pick one random item from the list:
public class ChoiceItem : TreeItem
{
private static readonly Random _random = new();
public ChoiceItem(List<TreeItem> items)
{
Items = items;
}
public List<TreeItem> Items { get; }
public override string GetRandomSentence()
{
int index = _random.Next(Items.Count);
return Items[index].GetRandomSentence();
}
}
Note that the sequence and choice items both call GetRandomSentence()
recursively on their items to descend the tree recursively.
This was the easy part. Now lets create a parser.
public class Parser
{
enum Token { Text, LeftBrace, RightBrace, Comma, EndOfString }
int _index;
string _definition;
Token _token;
string _text; // If token is Token.Text;
public TreeItem Parse(string definition)
{
_index = 0;
_definition = definition;
GetToken();
return Choice();
}
private void GetToken()
{
if (_index >= _definition.Length) {
_token = Token.EndOfString;
return;
}
switch (_definition[_index]) {
case '{':
_index++;
_token = Token.LeftBrace;
break;
case '}':
_index++;
_token = Token.RightBrace;
break;
case ',':
_index++;
_token = Token.Comma;
break;
default:
int startIndex = _index;
do {
_index++;
} while (_index < _definition.Length & !"{},".Contains(_definition[_index]));
_text = _definition[startIndex.._index];
_token = Token.Text;
break;
}
}
private TreeItem Choice()
{
var items = new List<TreeItem>();
while (_token != Token.EndOfString && _token != Token.RightBrace) {
items.Add(Sequence());
if (_token == Token.Comma) {
GetToken();
}
}
if (items.Count == 0) {
return new TextItem("");
}
if (items.Count == 1) {
return items[0];
}
return new ChoiceItem(items);
}
private TreeItem Sequence()
{
var items = new List<TreeItem>();
while (true) {
if (_token == Token.Text) {
items.Add(new TextItem(_text));
GetToken();
} else if (_token == Token.LeftBrace) {
GetToken();
items.Add(Choice());
if (_token == Token.RightBrace) {
GetToken();
}
} else {
break;
}
}
if (items.Count == 0) {
return new TextItem("");
}
if (items.Count == 1) {
return items[0];
}
return new SequenceItem(items);
}
}
It consists of a lexer, i.e., a low level mechanism to split the input text into tokens. We have have four kinds of tokens: text, "{", "}" and ",". We represent these tokens as
enum Token { Text, LeftBrace, RightBrace, Comma, EndOfString }
We also have added a EndOfString
token to tell the parser that the end of the input string was reached. When the token is Text
we store this text in the field _text
. The lexer is implemented by the GetToken()
method which has no return value and instead sets the _token
field, to make the current token available in the two parsing methods Choice()
and Sequence()
.
One difficulty is that when we encounter an item, we do not know whether it is a single item or whether it is part of a sequence or a choice. We assume that the whole sentence definition is a choice consisting of sequences, which gives sequences precedence over choices (like "*" has precedence over "+" in math).
Both Choice
and Sequence
gather items in a temporary list. If this list contains only one item, then this item will be returned instead of a choice list or a sequence list.
You can test this parser like this:
const string example = "{{Hello,Hi,Hey} {world,earth},{Goodbye,farewell} {planet,rock,globe{.,!}}}";
var parser = new Parser();
var tree = parser.Parse(example);
for (int i = 0; i < 20; i++) {
Console.WriteLine(tree.GetRandomSentence());
}
The output might look like this:
Goodbye rock
Hi earth
Goodbye globe.
Hey world
Goodbye rock
Hi earth
Hey earth
farewell planet
Goodbye globe.
Hey world
Goodbye planet
Hello world
Hello world
Goodbye planet
Hey earth
farewell globe!
Goodbye globe.
Goodbye globe.
Goodbye planet
farewell rock
Upvotes: 1
Reputation: 520
I think that can be a complicated job, for that I used this tutorial, I strongly advice you to read the entire page to understand how this works.
First, you have to pass this "tree" as an array. You can parse the string, manually set the array or whatever. That's important because there isn't a good model for that tree model so it's better if you use a already available one. Also, it's important that if you want to set a correct grammar, you'll need to add "weight" to those words and tell the code how to correctly set and in what order.
Here is the code snippet:
using System;
using System.Text;
namespace App
{
class Program
{
static void Main(string[] args)
{
string tree = "{{Hello,Hi,Hey} {world,earth},{Goodbye,farewell} {planet,rock,globe{.,!}}}";
string[] words = { "Hello", "Hi", "Hey", "world", "earth", "Goodbye", "farewell", "planet", "rock", "globe" };
RandomText text = new RandomText(words);
text.AddContentParagraphs(12, 1, 3, 3, 3);
string content = text.Content;
Console.WriteLine(content);
}
}
public class RandomText
{
static Random _random = new Random();
StringBuilder _builder;
string[] _words;
public RandomText(string[] words)
{
_builder = new StringBuilder();
_words = words;
}
public void AddContentParagraphs(int numberParagraphs, int minSentences,
int maxSentences, int minWords, int maxWords)
{
for (int i = 0; i < numberParagraphs; i++)
{
AddParagraph(_random.Next(minSentences, maxSentences + 1),
minWords, maxWords);
_builder.Append("\n\n");
}
}
void AddParagraph(int numberSentences, int minWords, int maxWords)
{
for (int i = 0; i < numberSentences; i++)
{
int count = _random.Next(minWords, maxWords + 1);
AddSentence(count);
}
}
void AddSentence(int numberWords)
{
StringBuilder b = new StringBuilder();
// Add n words together.
for (int i = 0; i < numberWords; i++) // Number of words
{
b.Append(_words[_random.Next(_words.Length)]).Append(" ");
}
string sentence = b.ToString().Trim() + ". ";
// Uppercase sentence
sentence = char.ToUpper(sentence[0]) + sentence.Substring(1);
// Add this sentence to the class
_builder.Append(sentence);
}
public string Content
{
get
{
return _builder.ToString();
}
}
}
}
Upvotes: 0