Zia Ur Rehman
Zia Ur Rehman

Reputation: 235

Extract some specific result from text file in c#

the following input File

INPUT FILE

a    00002098    0    0.75    unable#1    (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds"
a    00002312    0.23    0.43    dorsal#2 abaxial#1    facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem"
a    00023655    0    0.5    outside#10 away#3 able#2    (of a baseball pitch) on the far side of home plate from the batter; "the pitch was away (or wide)"; "an outside pitch"    

And i wants the following result for this file
OUTPUT

a,00002098,0,0.75,unable#1
a,00002312,0.23,0.43,dorsal#2 
a,00002312,0.23,0.43,abaxial#1    
a,00023655,0, 0.5,outside#10    
a,00023655,0, 0.5,away#3
a,00023655,0, 0.5,able#2    

i writes the following code to extract such above result

 TextWriter tw = new StreamWriter("D:\\output.txt");

        private void button1_Click(object sender, EventArgs e)
        {
            if (textBox1.Text != null)
            {
                StreamReader reader = new StreamReader(@"C:\Users\Zia\Desktop\input.txt");
                string line;
                String lines = "";
                while ((line = reader.ReadLine()) != null)
                {
                    String[] str = line.Split('\t');
                    String[] words = str[3].Split(' ');
                    for (int k = 0; k < words.Length; k++)
                    {
                        for (int i = 0; i < str.Length; i++)
                        {
                            if (i + 1 != str.Length)
                            {
                                lines = lines + str[i] + ",";
                            }
                            else
                            {
                                lines = lines + words[k] + "\r\n";
                            }
                        }
                    }
                }
                tw.Write(lines);
                tw.Close();
                reader.Close();
            }
        }    

when i change the index,this code gives the following Error and not gives the desire result.
ERROR
Index was outside the bounds of the array.
thanks in advance.

Upvotes: 1

Views: 1855

Answers (3)

Alyafey
Alyafey

Reputation: 1453

       private void extrcat()
       {
            char[] delimiters = new char[] { '\r', '\n' };
            using (StreamReader reader = new StreamReader(@"C:\Users\Zia\Desktop\input.txt"))
            {
                string words = reader.ReadToEnd();
                string[] lines = words.Split(delimiters);
                foreach (var item in lines)
                {
                    foreach (var i in findItems(item))
                    {
                        if (i != " ")
                            Console.WriteLine(i);
                    }
                }

            }

        }
        private static List<string> findItems(string item)
        {
            List<string> items = new List<string>();

            if (item.Length <= 0)
            {
                items.Add(" ");
            }
            else
            {
                List<string> names = new List<string>();
                string temp = item.Substring(0, item.IndexOf("#") + 2);
                temp = temp.Replace("\t", ",");
                temp = temp.Replace("\\t", ",");


                items.Add(temp);
                names = item.Split(' ').Where(x => x.Contains('#')).ToList();
                int i = 1;
                while (i < names.Count)
                {
                    temp = items[0].Substring(0, items[0].LastIndexOf(',')+1) + names[i];
                    items.Add(temp);
                    i++;
                }
            }

            return items;

        }

enter image description here

Upvotes: 0

Mzf
Mzf

Reputation: 5260

I understand that you want each word(in the last column) that contain # should be as a new result line So it should be something like

        List<string> result = new List<string>();

        var lines = str.Split('\n');
        foreach (var line in lines)
        {
            var words = line.Split('\t');
            string res = String.Format("{1}{0}{2}{0}{3}{0}{4}", ",", words[0], words[1], words[2], words[3]);

            var xx = words[4].Split(' ').Where(word => word.Contains("#"));
            foreach (var s in xx)
            {
                result.Add(String.Format(res + "," + s));
            }
        }

Upvotes: 1

Levi Botelho
Levi Botelho

Reputation: 25214

Why not try this algorithm, looping for each line in the text:

var elements = line.Split('\t');
var words = elements[4].Split(' ');
foreach(var word in words)
{
    Console.WriteLine(string.Concat(elements[0], ",", elements[1], ",", elements[2], ",", elements[3], ",", word));
}

This seems to output exactly what you need. Just change the Console.WriteLine to write to your file.

Upvotes: 2

Related Questions