Reputation: 48

String in contains. C#

So i have a text im searching trough, im searching for a specific word in the text. To show this, im defining the word im searching for as: "1 Johannes 1:12". I use the String.Contains method, but that method is returning two answers because, i also search trough with another sentence: "1 Johannes 1:1". So what it does, instead of taking "1 Johannes 1:12" first and then "1 Johannes 1:1", its doing the opposite. Now this is bad for me. Because i want the correct sentence.

I have tried different options, including substring. But i kinda need help on this subject.

Thanks for all the answers in advance. Cheers!

List<string> sentences = new List<string>();
sentences.Add("1 Johannes 1:12");
sentences.Add("1 Johannes 1:1");
string fulltext = "randomtext 1 Johannes 1:12 randomtext";

foreach (string item in sentences)
{
    if (fulltext.Contains(item))
    {
        //expect the result to be 1 Johannes 1:12, but the result is 1 Johannes 1:1 
        //do operation
    }
}

Upvotes: 1

Answers (5)

Berin Loritsch

Reputation: 11473

Bible reference parsing and recognition is tricky, particularly because there are multiple abbreviation styles, numbers that look similar. The problem you have is that String.Contains() is a pretty big hammer, and you need something more like a set of socket wrenches. In other words, a full and proper answer is going to require more code than can fit comfortably in this format. I've written code to go through devotionals and transcripts and pull out all the references. The code is in a private repository, but I'll try to post the relevant parts.

A Bible reference is written in this format: {Book} {Chapter}:{Verse}, with some variation for verse ranges. So the first part is recognizing the book. For that purpose I created a class to represent a book and it's known abbreviations (I was supporting two documented abbreviation styles). The Book class looks like this:

public class Book
{
    // The set of books we recognize
    private static readonly List<Book> books;
    private static readonly Dictionary<string, Book> commonMisspellings;

    static Book()
    {
        // Initialize the set
        books = new List<Book>{
            // Old Testament
            new Book("Genesis", "Gen.", "Ge", 50), // Gen
            new Book("Exodus", "Ex.", "Ex", 40),  // Exod
            new Book("Leviticus", "Lev.", "Le", 27), // Lev
            new Book("Numbers", "Num.", "Nu", 36), // Num
            new Book("Deuteronomy", "Deut.", "De", 34), // Deut
            new Book("Joshua", "Josh.", "Jos", 24), // Josh
            new Book("Judges", "Judg.", "Jud", 21), // Judg
            new Book("Ruth", "Ruth", "Ru", 4), // Ruth
            new Book("1 Samuel", "1 Sam.", "1 S", 31), // 1Sam
            new Book("2 Samuel", "2 Sam.", "2 S", 24), // 2Sam
            new Book("1 Kings", "1 Kings", "1 K", 22), // 1Kgs
            new Book("2 Kings", "2 Kings", "2 K", 25), // 2Kgs
            new Book("1 Chronicles", "1 Chron.", "1 Chr", 29), // 1Chr
            new Book("2 Chronicles", "2 Chron.", "2 Chr", 36), // 2Chr
            new Book("Ezra", "Ezra", "Ezr", 10), // Ezra
            new Book("Nehemiah", "Neh.", "Ne", 13), // Neh
            new Book("Esther", "Est.", "Est", 10), // Esth
            new Book("Job", "Job", "Jb", 42), // Job
            new Book("Psalms", "Ps.", "Ps", 150), // Ps
            new Book("Proverbs", "Prov.", "Pr", 31), // Prov
            new Book("Ecclesiastes", "Eccl.", "Ec", 12), // Eccl
            new Book("Song of Solomon", "Song", "Song", 8), // Song
            new Book("Isaiah", "Isa.", "Is", 66), // Isa
            new Book("Jeremiah", "Jer.", "Je", 52), // Jer
            new Book("Lamentations", "Lam.", "Lam", 5), // Lam
            new Book("Ezekiel", "Ezek.", "Ez", 48), // Ezek
            new Book("Daniel", "Dan.", "Da", 12), // Dan
            new Book("Hosea", "Hos.", "Ho", 14), // Hos
            new Book("Joel", "Joel", "Joel", 3), // Joel
            new Book("Amos", "Amos", "Am", 9), // Amos
            new Book("Obadaiah", "Obad.", "Obad", 1), // Obad
            new Book("Jonah", "Jonah", "Jona", 4), // Jonah
            new Book("Micah", "Mic.", "Mi", 7), // Mic
            new Book("Nahum", "Nah.", "Na", 3), // Nah
            new Book("Habakkuk", "Hab.", "Hab", 3), // Hab
            new Book("Zephaniah", "Zeph.", "Zep", 3), // Zeph
            new Book("Haggai", "Hag.", "Hag", 2), // Hag
            new Book("Zechariah", "Zech.", "Zec", 14), // Zech
            new Book("Malachai", "Mal.", "Mal", 4), // Mal

            // New Testament
            new Book("Matthew", "Matt.", "Mt", 28), // Matt
            new Book("Mark", "Mark", "Mk", 16), // Mark
            new Book("Luke", "Luke", "Lu", 24), // Luke
            new Book("John", "John", "Jn", 21), // John
            new Book("Acts", "Acts", "Ac", 28), // Acts
            new Book("Romans", "Rom.", "Ro", 16), // Rom
            new Book("1 Corinthians", "1 Cor.", "1 Co", 16), // 1Cor
            new Book("2 Corinthians", "2 Cor.", "2 Co", 13), // 2Cor
            new Book("Galatians", "Gal.", "Ga", 6), // Gal
            new Book("Ephesians", "Eph.", "Ep", 6), // Eph
            new Book("Philippians", "Phil.", "Ph", 4), // Phil
            new Book("Colossians", "Col.", "Col", 4), // Col
            new Book("1 Thessalonians", "1 Thes.", "1 Th", 5), // 1Thess
            new Book("2 Thessalonians", "2 Thes.", "2 Th", 3), // 2Thess
            new Book("1 Timothy", "1 Tim.", "1 Ti", 6), // 1Tim
            new Book("2 Timothy", "2 Tim.", "2 Ti", 4), // 2Tim
            new Book("Titus", "Titus", "Tit", 3), // Titus
            new Book("Philemon", "Philem.", "Phm", 1), // Phlm
            new Book("Hebrews", "Heb.", "He", 13), // Heb
            new Book("James", "James", "Ja", 5), // Jas
            new Book("1 Peter", "1 Peter", "1 Pe", 5), // 1Pet
            new Book("2 Peter", "2 Peter", "2 Pe", 3), // 2Pet
            new Book("1 John", "1 John", "1 Jn", 5), // 1John
            new Book("2 John", "2 John", "2 Jn", 1), // 2John
            new Book("3 John", "3 John", "3 Jn", 1), // 3John
            new Book("Jude", "Jude", "Jude", 1), // Jude
            new Book("Revelation", "Rev.", "Re", 22) // Rev
        };

        Debug.Assert(books.Count == 66);

        // These are based on what I found in the set of over 6,000
        // transcripts that people typed.
        commonMisspellings = new Dictionary<string, Book>();
        commonMisspellings.Add("song of songs", books.FirstOrDefault(b => b.ThompsonAbreviation == "Song"));
        commonMisspellings.Add("psalm", books.FirstOrDefault(b => b.ThompsonAbreviation == "Ps"));
        commonMisspellings.Add("like", books.FirstOrDefault(b => b.ThompsonAbreviation == "Lu"));
        commonMisspellings.Add("jerimiah", books.FirstOrDefault(b => b.ThompsonAbreviation == "Je"));
        commonMisspellings.Add("galations", books.FirstOrDefault(b => b.ThompsonAbreviation == "Ga"));
    }

    private static int numCreated = 0;
    private int order;

    private Book(string fullName, string abbrev, string thompsan, int chapters)
    {
        order = numCreated;
        Name = fullName;
        StandardAbreviation = abbrev;
        ThompsonAbreviation = thompsan;
        ChapterCount = chapters;
        numCreated++;
    }

    /// <summary>
    /// The unabbreviated name of the book.
    /// </summary>
    public string Name { get; private set; }

    /// <summary>
    /// Standard abbreviations as defined in "The Christian Writer's
    /// Manual of Style", 2004 edition (ISBN: 9780310487715).
    /// </summary>
    public string StandardAbreviation { get; private set; }

    /// <summary>
    /// Thompson Chain references, pulled from the 5th edition.
    /// </summary>
    public string ThompsonAbreviation { get; private set; }

    /// <summary>
    /// The number of chapters in the book.
    /// </summary>
    public int ChapterCount { get; private set; }

    public static bool TryParse(string inString, out Book book)
    {
        string potentialBook = StandardizeBookOrdinals(inString);

        // Find the first book where the input string now matches one of the recognized formats.
        book = books.FirstOrDefault(
            b => b.ThompsonAbreviation.Equals(potentialBook, StringComparison.InvariantCultureIgnoreCase) 
                || b.StandardAbreviation.Equals(potentialBook, StringComparison.InvariantCultureIgnoreCase)
                || b.Name.Equals(potentialBook, StringComparison.InvariantCultureIgnoreCase));

        if (book != null)
        {
            return true;
        }

        // If we didn't find it, check to see if we just missed it because the abbreviation
        // didn't have a period
        book = books.FirstOrDefault((b) =>
        {
            string stdAbrev = b.StandardAbreviation;
            if(stdAbrev.EndsWith("."))
            {
                stdAbrev = stdAbrev.Substring(0, stdAbrev.Length - 1);
            }

            return potentialBook == stdAbrev;
        });

        if (book != null)
        {
            return true;
        }

        // Special Case: check for common misspellings
        string lowercase = potentialBook.ToLowerInvariant();
        commonMisspellings.TryGetValue(lowercase, out book);

        return book != null;
    }

    private static string StandardizeBookOrdinals(string str)
    {
        // Break up on all remaining white space
        string[] parts = (str ?? "").Trim().Split(' ', '\r', '\n', '\t');

        // If the first part is a roman numeral, or spelled ordinal, convert it to arabic
        var number = parts[0].ToLowerInvariant();
        switch (number)
        {
            case "first":
            case "i":
                parts[0] = "1";
                break;

            case "second":
            case "ii":
                parts[0] = "2";
                break;

            case "third":
            case "iii":
                parts[0] = "3";
                break;
        }

        // Recompile the parts into one string that only has a single space separating elements
        return string.Join(" ", parts);
    }

    public static IEnumerable<Book> List()
    {
        return books.ToArray();
    }
}

So that lets you recognize any book if you feed that text into TryParse(). We even handle common misspellings, roman numerals (I, II, III) vs. arabic numerals (1, 2, 3), and more than one abbreviation style. Feel free to adapt as necessary, but once we can recognize a book, the rest is going to be the same. The reason for listing the number of chapters in a book will become more apparent when you look at the next class for dealing with a Reference

public class Reference
{
    private static readonly Regex RemoveHtml = new Regex("<[^>]*>", RegexOptions.Compiled);

    public Book Book { get; set; }
    public int Chapter { get; set; }
    public int[] Verses { get; set; }

    public static bool TryParse(string text, out Reference reference)
    {
        string errorString;
        reference = InternalParse(text, out errorString);

        if(errorString!=null)
        {
            reference = null;
            return false;
        }

        return true;
    }

   private static Reference InternalParse(string text, out string errorString)
    {
        errorString = null;
        int colon = text.LastIndexOf(':');
        int chapter = -1;
        string chapterSection = "1";
        string verseSection = "";

        if (colon > 0)
        {
            verseSection = text.Substring(colon + 1);
            chapter = colon - 3;

            chapterSection = text.Substring(chapter, colon - chapter);
            while (!string.IsNullOrEmpty(chapterSection) && !Char.IsDigit(chapterSection[0]))
            {
                chapter++;
                chapterSection = text.Substring(chapter, colon - chapter);
            }
        }
        else
        {
            chapter = 2;  // skip initial numbers for books
            while(chapter < text.Length && !Char.IsDigit(text[chapter]))
            {
                chapter++;
            }

            if(chapter == text.Length)
            {
                errorString = "There are no chapter or verses, can't be a reference.";
                return null;
            }

            verseSection = text.Substring(chapter);
        }

        Book book;
        if (!Book.TryParse(text.Substring(0, chapter), out book))
        {
            errorString = "There is no book, can't be a reference.";
            return null;
        }

        if(!int.TryParse(chapterSection, out chapter))
        {
            errorString = "Bad chapter format";
            return null;
        }

        Reference reference = new Reference
        {
            Book = book,
            Chapter = chapter
        };

        if(colon < 0 && reference.Book.ChapterCount > 1)
        {
            if(!int.TryParse(verseSection, out chapter))
            {
                errorString = "Bad chapter format.";
                return null;
            }

            reference.Chapter = chapter;
            reference.Verses = new int[0];
            return reference;
        }

        if (reference.Chapter > reference.Book.ChapterCount)
        {
            errorString = "Chapter found was too high";
            return null;
        }

        reference.Verses = ParseRanges(verseSection, out errorString);

        return reference;
    }

    private static int[] ParseRanges(string section, out string errorString)
    {
        errorString = null;
        List<int> numbers = new List<int>();
        string[] items = section.Split(',');

        foreach (string verse in items)
        {
            string[] ranges = verse.Split('-');

            if (ranges.Length > 2 || ranges.Length == 0)
            {
                errorString = "Invalid range specification";
                return new int[0];
            }

            int start;
            if(!int.TryParse(ranges[0], out start))
            {
                errorString = "Invalid range specification";
                return new int[0];
            }

            int end = start;
            if(ranges.Length >1 && !int.TryParse(ranges[1], out end))
            {
                errorString = "Invalid range specification";
                return new int[0];
            }

            if (end < start)
            {
                errorString = "invalid range specification";
                return new int[0];
            }

            for (int i = start; i <= end; i++)
            {
                numbers.Add(i);
            }
        }

        return numbers.ToArray();
    }
}

With all that set up, we can now scan any text for Bible references. This method was also in my Reference class:

    public static ICollection<Reference> Scan(string text)
    {
        List<Reference> references = new List<Reference>();

        if (text == null)
        {
            return references;
        }

        string[] words = RemoveHtml.Replace(text, "").Split(' ', '(', ')', ';', '\r', '\n', '\t');

        for (int i = 0; i < words.Length; i++)
        {
            string one = words[i];

            // If we are starting with a blank entry, just skip this cycle
            if(string.IsNullOrWhiteSpace(one))
            {
                continue;
            }

            string two = i + 1 < words.Length ? string.Join(" ", one, words[i + 1]) : one;
            string three = i + 2 < words.Length ? string.Join(" ", two, words[i + 2]) : two;

            Book book;
            bool match = Book.TryParse(one, out book);
            match = match || Book.TryParse(two, out book);
            match = match || Book.TryParse(three, out book);

            if(match)
            {
                string four = i + 3 < words.Length ? string.Join(" ", three, words[i + 3]) : three;
                string five = i + 4 < words.Length ? string.Join(" ", four, words[i + 4]) : four;

                // Keep the most inclusive version of the reference
                Reference found = null;
                foreach(string test in new [] {two,three,four,five})
                {
                    Reference check;
                    if(TryParse(test, out check))
                    {
                        found = check;
                    }
                }

                if(found != null && !references.Contains(found))
                {
                    references.Add(found);
                }
            }
        }

        return references;
    }

This is going to be the most robust way to handle what you want and handle corner cases you haven't considered. There's more to the code to handle sorting, equality, and taking a set of references and reduce them to the smallest set (in transcripts we commonly work through a passage of scripture bit by bit, so this lets us create the reference for the whole range after scanning the entire transcript).

Upvotes: 2

prime

Reputation: 351

Ok well this fulltext contains both of your values . so you always get the last value of your list if you want to get the first value of this filter you can use some thing like this :

string item1 = "1 Johannes 1:12";
string item2 = "1 Johannes 1:1";
string fullText= "randomtext 1 Johannes 1:12 randomtext";
string comparedValue =fullText.Replace(" ", string.Empty)
string result ;
List<string> sentences = new List<string>();
sentences.add(item1.Replace(" ", string.Empty));
sentences.add(item2.Replace(" ", string.Empty));
foreach(string item in sentences){
      if(comparedValue .Contains(item){
      result = item;
   break;

      }

}

Now you can use result

Upvotes: 0

MethodMan

Reputation: 18863

if you want the list to come out based on your code you need to Sort() the sentenses List

List<string> sentences = new List<string>();
sentences.Add("1 Johannes 1:12");
sentences.Add("1 Johannes 1:1");
string fulltext = "randomtext 1 Johannes 1:12 randomtext";
sentences.Sort();
foreach(string item in sentences)
{
   if(fulltext.Contains(item))
   {
      //expect the result to be 1 Johannes 1:12, but the result is 1 Johannes 1:1 
      //do operation
      Console.WriteLine(item);//try it in a Console App you will get the results in the order that you are expecting
   }
}
Console.Read();

Upvotes: 0

sujith karivelil

Reputation: 29036

Let the current search string may defined like :

string searchString="1 Johannes 1:1";

Simple change will give you the expected result, ie., add a space in starting and ending of the search string :

string searchString=" 1 Johannes 1:1 ";

Upvotes: 0

Shachaf.Gortler

Reputation: 5745

You should remove all white spaces from you string and the string you are searching

searchString.Replace(" ", string.Empty);

fullText.Replace(" ", string.Empty);



fullText.Contains(searchString)

or you want an exact match , you can use RegEx

bool contains = Regex.IsMatch(fullText, @"(^|\s)" + searchString + "(\s|$)");

Upvotes: 0

String in contains. C#

Answers (5)

Related Questions