Reputation: 48
So i have a text im searching trough, im searching for a specific word in the text. To show this, im defining the word im searching for as: "1 Johannes 1:12"
. I use the String.Contains method, but that method is returning two answers because, i also search trough with another sentence: "1 Johannes 1:1"
. So what it does, instead of taking "1 Johannes 1:12"
first and then "1 Johannes 1:1"
, its doing the opposite. Now this is bad for me. Because i want the correct sentence.
I have tried different options, including substring. But i kinda need help on this subject.
Thanks for all the answers in advance. Cheers!
List<string> sentences = new List<string>();
sentences.Add("1 Johannes 1:12");
sentences.Add("1 Johannes 1:1");
string fulltext = "randomtext 1 Johannes 1:12 randomtext";
foreach (string item in sentences)
{
if (fulltext.Contains(item))
{
//expect the result to be 1 Johannes 1:12, but the result is 1 Johannes 1:1
//do operation
}
}
Upvotes: 1
Views: 672
Reputation: 11473
Bible reference parsing and recognition is tricky, particularly because there are multiple abbreviation styles, numbers that look similar. The problem you have is that String.Contains()
is a pretty big hammer, and you need something more like a set of socket wrenches. In other words, a full and proper answer is going to require more code than can fit comfortably in this format. I've written code to go through devotionals and transcripts and pull out all the references. The code is in a private repository, but I'll try to post the relevant parts.
A Bible reference is written in this format: {Book} {Chapter}:{Verse}
, with some variation for verse ranges. So the first part is recognizing the book. For that purpose I created a class to represent a book and it's known abbreviations (I was supporting two documented abbreviation styles). The Book
class looks like this:
public class Book
{
// The set of books we recognize
private static readonly List<Book> books;
private static readonly Dictionary<string, Book> commonMisspellings;
static Book()
{
// Initialize the set
books = new List<Book>{
// Old Testament
new Book("Genesis", "Gen.", "Ge", 50), // Gen
new Book("Exodus", "Ex.", "Ex", 40), // Exod
new Book("Leviticus", "Lev.", "Le", 27), // Lev
new Book("Numbers", "Num.", "Nu", 36), // Num
new Book("Deuteronomy", "Deut.", "De", 34), // Deut
new Book("Joshua", "Josh.", "Jos", 24), // Josh
new Book("Judges", "Judg.", "Jud", 21), // Judg
new Book("Ruth", "Ruth", "Ru", 4), // Ruth
new Book("1 Samuel", "1 Sam.", "1 S", 31), // 1Sam
new Book("2 Samuel", "2 Sam.", "2 S", 24), // 2Sam
new Book("1 Kings", "1 Kings", "1 K", 22), // 1Kgs
new Book("2 Kings", "2 Kings", "2 K", 25), // 2Kgs
new Book("1 Chronicles", "1 Chron.", "1 Chr", 29), // 1Chr
new Book("2 Chronicles", "2 Chron.", "2 Chr", 36), // 2Chr
new Book("Ezra", "Ezra", "Ezr", 10), // Ezra
new Book("Nehemiah", "Neh.", "Ne", 13), // Neh
new Book("Esther", "Est.", "Est", 10), // Esth
new Book("Job", "Job", "Jb", 42), // Job
new Book("Psalms", "Ps.", "Ps", 150), // Ps
new Book("Proverbs", "Prov.", "Pr", 31), // Prov
new Book("Ecclesiastes", "Eccl.", "Ec", 12), // Eccl
new Book("Song of Solomon", "Song", "Song", 8), // Song
new Book("Isaiah", "Isa.", "Is", 66), // Isa
new Book("Jeremiah", "Jer.", "Je", 52), // Jer
new Book("Lamentations", "Lam.", "Lam", 5), // Lam
new Book("Ezekiel", "Ezek.", "Ez", 48), // Ezek
new Book("Daniel", "Dan.", "Da", 12), // Dan
new Book("Hosea", "Hos.", "Ho", 14), // Hos
new Book("Joel", "Joel", "Joel", 3), // Joel
new Book("Amos", "Amos", "Am", 9), // Amos
new Book("Obadaiah", "Obad.", "Obad", 1), // Obad
new Book("Jonah", "Jonah", "Jona", 4), // Jonah
new Book("Micah", "Mic.", "Mi", 7), // Mic
new Book("Nahum", "Nah.", "Na", 3), // Nah
new Book("Habakkuk", "Hab.", "Hab", 3), // Hab
new Book("Zephaniah", "Zeph.", "Zep", 3), // Zeph
new Book("Haggai", "Hag.", "Hag", 2), // Hag
new Book("Zechariah", "Zech.", "Zec", 14), // Zech
new Book("Malachai", "Mal.", "Mal", 4), // Mal
// New Testament
new Book("Matthew", "Matt.", "Mt", 28), // Matt
new Book("Mark", "Mark", "Mk", 16), // Mark
new Book("Luke", "Luke", "Lu", 24), // Luke
new Book("John", "John", "Jn", 21), // John
new Book("Acts", "Acts", "Ac", 28), // Acts
new Book("Romans", "Rom.", "Ro", 16), // Rom
new Book("1 Corinthians", "1 Cor.", "1 Co", 16), // 1Cor
new Book("2 Corinthians", "2 Cor.", "2 Co", 13), // 2Cor
new Book("Galatians", "Gal.", "Ga", 6), // Gal
new Book("Ephesians", "Eph.", "Ep", 6), // Eph
new Book("Philippians", "Phil.", "Ph", 4), // Phil
new Book("Colossians", "Col.", "Col", 4), // Col
new Book("1 Thessalonians", "1 Thes.", "1 Th", 5), // 1Thess
new Book("2 Thessalonians", "2 Thes.", "2 Th", 3), // 2Thess
new Book("1 Timothy", "1 Tim.", "1 Ti", 6), // 1Tim
new Book("2 Timothy", "2 Tim.", "2 Ti", 4), // 2Tim
new Book("Titus", "Titus", "Tit", 3), // Titus
new Book("Philemon", "Philem.", "Phm", 1), // Phlm
new Book("Hebrews", "Heb.", "He", 13), // Heb
new Book("James", "James", "Ja", 5), // Jas
new Book("1 Peter", "1 Peter", "1 Pe", 5), // 1Pet
new Book("2 Peter", "2 Peter", "2 Pe", 3), // 2Pet
new Book("1 John", "1 John", "1 Jn", 5), // 1John
new Book("2 John", "2 John", "2 Jn", 1), // 2John
new Book("3 John", "3 John", "3 Jn", 1), // 3John
new Book("Jude", "Jude", "Jude", 1), // Jude
new Book("Revelation", "Rev.", "Re", 22) // Rev
};
Debug.Assert(books.Count == 66);
// These are based on what I found in the set of over 6,000
// transcripts that people typed.
commonMisspellings = new Dictionary<string, Book>();
commonMisspellings.Add("song of songs", books.FirstOrDefault(b => b.ThompsonAbreviation == "Song"));
commonMisspellings.Add("psalm", books.FirstOrDefault(b => b.ThompsonAbreviation == "Ps"));
commonMisspellings.Add("like", books.FirstOrDefault(b => b.ThompsonAbreviation == "Lu"));
commonMisspellings.Add("jerimiah", books.FirstOrDefault(b => b.ThompsonAbreviation == "Je"));
commonMisspellings.Add("galations", books.FirstOrDefault(b => b.ThompsonAbreviation == "Ga"));
}
private static int numCreated = 0;
private int order;
private Book(string fullName, string abbrev, string thompsan, int chapters)
{
order = numCreated;
Name = fullName;
StandardAbreviation = abbrev;
ThompsonAbreviation = thompsan;
ChapterCount = chapters;
numCreated++;
}
/// <summary>
/// The unabbreviated name of the book.
/// </summary>
public string Name { get; private set; }
/// <summary>
/// Standard abbreviations as defined in "The Christian Writer's
/// Manual of Style", 2004 edition (ISBN: 9780310487715).
/// </summary>
public string StandardAbreviation { get; private set; }
/// <summary>
/// Thompson Chain references, pulled from the 5th edition.
/// </summary>
public string ThompsonAbreviation { get; private set; }
/// <summary>
/// The number of chapters in the book.
/// </summary>
public int ChapterCount { get; private set; }
public static bool TryParse(string inString, out Book book)
{
string potentialBook = StandardizeBookOrdinals(inString);
// Find the first book where the input string now matches one of the recognized formats.
book = books.FirstOrDefault(
b => b.ThompsonAbreviation.Equals(potentialBook, StringComparison.InvariantCultureIgnoreCase)
|| b.StandardAbreviation.Equals(potentialBook, StringComparison.InvariantCultureIgnoreCase)
|| b.Name.Equals(potentialBook, StringComparison.InvariantCultureIgnoreCase));
if (book != null)
{
return true;
}
// If we didn't find it, check to see if we just missed it because the abbreviation
// didn't have a period
book = books.FirstOrDefault((b) =>
{
string stdAbrev = b.StandardAbreviation;
if(stdAbrev.EndsWith("."))
{
stdAbrev = stdAbrev.Substring(0, stdAbrev.Length - 1);
}
return potentialBook == stdAbrev;
});
if (book != null)
{
return true;
}
// Special Case: check for common misspellings
string lowercase = potentialBook.ToLowerInvariant();
commonMisspellings.TryGetValue(lowercase, out book);
return book != null;
}
private static string StandardizeBookOrdinals(string str)
{
// Break up on all remaining white space
string[] parts = (str ?? "").Trim().Split(' ', '\r', '\n', '\t');
// If the first part is a roman numeral, or spelled ordinal, convert it to arabic
var number = parts[0].ToLowerInvariant();
switch (number)
{
case "first":
case "i":
parts[0] = "1";
break;
case "second":
case "ii":
parts[0] = "2";
break;
case "third":
case "iii":
parts[0] = "3";
break;
}
// Recompile the parts into one string that only has a single space separating elements
return string.Join(" ", parts);
}
public static IEnumerable<Book> List()
{
return books.ToArray();
}
}
So that lets you recognize any book if you feed that text into TryParse()
. We even handle common misspellings, roman numerals (I, II, III) vs. arabic numerals (1, 2, 3), and more than one abbreviation style. Feel free to adapt as necessary, but once we can recognize a book, the rest is going to be the same. The reason for listing the number of chapters in a book will become more apparent when you look at the next class for dealing with a Reference
public class Reference
{
private static readonly Regex RemoveHtml = new Regex("<[^>]*>", RegexOptions.Compiled);
public Book Book { get; set; }
public int Chapter { get; set; }
public int[] Verses { get; set; }
public static bool TryParse(string text, out Reference reference)
{
string errorString;
reference = InternalParse(text, out errorString);
if(errorString!=null)
{
reference = null;
return false;
}
return true;
}
private static Reference InternalParse(string text, out string errorString)
{
errorString = null;
int colon = text.LastIndexOf(':');
int chapter = -1;
string chapterSection = "1";
string verseSection = "";
if (colon > 0)
{
verseSection = text.Substring(colon + 1);
chapter = colon - 3;
chapterSection = text.Substring(chapter, colon - chapter);
while (!string.IsNullOrEmpty(chapterSection) && !Char.IsDigit(chapterSection[0]))
{
chapter++;
chapterSection = text.Substring(chapter, colon - chapter);
}
}
else
{
chapter = 2; // skip initial numbers for books
while(chapter < text.Length && !Char.IsDigit(text[chapter]))
{
chapter++;
}
if(chapter == text.Length)
{
errorString = "There are no chapter or verses, can't be a reference.";
return null;
}
verseSection = text.Substring(chapter);
}
Book book;
if (!Book.TryParse(text.Substring(0, chapter), out book))
{
errorString = "There is no book, can't be a reference.";
return null;
}
if(!int.TryParse(chapterSection, out chapter))
{
errorString = "Bad chapter format";
return null;
}
Reference reference = new Reference
{
Book = book,
Chapter = chapter
};
if(colon < 0 && reference.Book.ChapterCount > 1)
{
if(!int.TryParse(verseSection, out chapter))
{
errorString = "Bad chapter format.";
return null;
}
reference.Chapter = chapter;
reference.Verses = new int[0];
return reference;
}
if (reference.Chapter > reference.Book.ChapterCount)
{
errorString = "Chapter found was too high";
return null;
}
reference.Verses = ParseRanges(verseSection, out errorString);
return reference;
}
private static int[] ParseRanges(string section, out string errorString)
{
errorString = null;
List<int> numbers = new List<int>();
string[] items = section.Split(',');
foreach (string verse in items)
{
string[] ranges = verse.Split('-');
if (ranges.Length > 2 || ranges.Length == 0)
{
errorString = "Invalid range specification";
return new int[0];
}
int start;
if(!int.TryParse(ranges[0], out start))
{
errorString = "Invalid range specification";
return new int[0];
}
int end = start;
if(ranges.Length >1 && !int.TryParse(ranges[1], out end))
{
errorString = "Invalid range specification";
return new int[0];
}
if (end < start)
{
errorString = "invalid range specification";
return new int[0];
}
for (int i = start; i <= end; i++)
{
numbers.Add(i);
}
}
return numbers.ToArray();
}
}
With all that set up, we can now scan any text for Bible references. This method was also in my Reference
class:
public static ICollection<Reference> Scan(string text)
{
List<Reference> references = new List<Reference>();
if (text == null)
{
return references;
}
string[] words = RemoveHtml.Replace(text, "").Split(' ', '(', ')', ';', '\r', '\n', '\t');
for (int i = 0; i < words.Length; i++)
{
string one = words[i];
// If we are starting with a blank entry, just skip this cycle
if(string.IsNullOrWhiteSpace(one))
{
continue;
}
string two = i + 1 < words.Length ? string.Join(" ", one, words[i + 1]) : one;
string three = i + 2 < words.Length ? string.Join(" ", two, words[i + 2]) : two;
Book book;
bool match = Book.TryParse(one, out book);
match = match || Book.TryParse(two, out book);
match = match || Book.TryParse(three, out book);
if(match)
{
string four = i + 3 < words.Length ? string.Join(" ", three, words[i + 3]) : three;
string five = i + 4 < words.Length ? string.Join(" ", four, words[i + 4]) : four;
// Keep the most inclusive version of the reference
Reference found = null;
foreach(string test in new [] {two,three,four,five})
{
Reference check;
if(TryParse(test, out check))
{
found = check;
}
}
if(found != null && !references.Contains(found))
{
references.Add(found);
}
}
}
return references;
}
This is going to be the most robust way to handle what you want and handle corner cases you haven't considered. There's more to the code to handle sorting, equality, and taking a set of references and reduce them to the smallest set (in transcripts we commonly work through a passage of scripture bit by bit, so this lets us create the reference for the whole range after scanning the entire transcript).
Upvotes: 2
Reputation: 351
Ok well this fulltext contains both of your values . so you always get the last value of your list if you want to get the first value of this filter you can use some thing like this :
string item1 = "1 Johannes 1:12";
string item2 = "1 Johannes 1:1";
string fullText= "randomtext 1 Johannes 1:12 randomtext";
string comparedValue =fullText.Replace(" ", string.Empty)
string result ;
List<string> sentences = new List<string>();
sentences.add(item1.Replace(" ", string.Empty));
sentences.add(item2.Replace(" ", string.Empty));
foreach(string item in sentences){
if(comparedValue .Contains(item){
result = item;
break;
}
}
Now you can use result
Upvotes: 0
Reputation: 18863
if you want the list to come out based on your code you need to Sort()
the sentenses List
List<string> sentences = new List<string>();
sentences.Add("1 Johannes 1:12");
sentences.Add("1 Johannes 1:1");
string fulltext = "randomtext 1 Johannes 1:12 randomtext";
sentences.Sort();
foreach(string item in sentences)
{
if(fulltext.Contains(item))
{
//expect the result to be 1 Johannes 1:12, but the result is 1 Johannes 1:1
//do operation
Console.WriteLine(item);//try it in a Console App you will get the results in the order that you are expecting
}
}
Console.Read();
Upvotes: 0
Reputation: 29036
Let the current search string may defined like :
string searchString="1 Johannes 1:1";
Simple change will give you the expected result, ie., add a space in starting and ending of the search string :
string searchString=" 1 Johannes 1:1 ";
Upvotes: 0
Reputation: 5745
You should remove all white spaces from you string and the string you are searching
searchString.Replace(" ", string.Empty);
fullText.Replace(" ", string.Empty);
fullText.Contains(searchString)
or you want an exact match , you can use RegEx
bool contains = Regex.IsMatch(fullText, @"(^|\s)" + searchString + "(\s|$)");
Upvotes: 0