Reputation: 35502
By text formatting I meant something more complicated.
At first I began manually adding the 5000 lines from the text file I'm asking this question for,into my project.
The text file has 5000 lines with different length.For example:
1 1 ITEM_ETC_GOLD_01 골드(소) xxx xxx xxx_TT_DESC 0 0 3 3 5 0 180000 3 0 1 0 0 255 1 1 0 0 0 0 0 0 0 0 0 0 -1 0 -1 0 -1 0 -1 0 -1 0 0 0 0 0 0 0 100 0 0 0 xxx item\etc\drop_ch_money_small.bsr xxx xxx xxx 0 2 0 0 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 표현할 골드의 양(param1이상) -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx 0 0
1 4 ITEM_ETC_HP_POTION_01 HP 회복 약초 xxx SN_ITEM_ETC_HP_POTION_01 SN_ITEM_ETC_HP_POTION_01_TT_DESC 0 0 3 3 1 1 180000 3 0 1 1 1 255 3 1 0 0 1 0 60 0 0 0 1 21 -1 0 -1 0 -1 0 -1 0 -1 0 0 0 0 0 0 0 100 0 0 0 xxx item\etc\drop_ch_bag.bsr item\etc\hp_potion_01.ddj xxx xxx 50 2 0 0 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 120 HP회복양 0 HP회복양(%) 0 MP회복양 0 MP회복양(%) -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx 0 0
1 5 ITEM_ETC_HP_POTION_02 HP 회복약 (소) xxx SN_ITEM_ETC_HP_POTION_02 SN_ITEM_ETC_HP_POTION_02_TT_DESC 0 0 3 3 1 1 180000 3 0 1 1 1 255 3 1 0 0 1 0 110 0 0 0 2 39 -1 0 -1 0 -1 0 -1 0 -1 0 0 0 0 0 0 0 100 0 0 0 xxx item\etc\drop_ch_bag.bsr item\etc\hp_potion_02.ddj xxx xxx 50 2 0 0 2 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 220 HP회복양 0 HP회복양(%) 0 MP회복양 0 MP회복양(%) -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx -1 xxx 0 0
The text between the first character(1) and the second character(1/4/5) is not a whitespace,it's a tab.There's no whitespaces in that text file.
What I want:
I want to get the second integer(In the three lines I posted above,the second integers are 1,4 and 5) and the string in the middle of each line indicating the path(It starts with "item\" and ends with the file extension ".ddj").
My problem:
When I google "Text formatting C#" - all I get is how to open a text file and how to write a text file in C#.I don't know how to search for text inside a text file.Also I can't search for the first integer,because in case its a small integer like in the three lines I posted above,I wont be able to find the corrent location,because for example "1" might exist in a different location.
My question:
It would be the best If I write a program that would delete anything,but what I need.
The other way in my mind is to directly search inside that file,but as I mentioned above - I might get the wrong location of the second integer if its too low.
Please suggest something,I can't format all this by hand.
Upvotes: 29
Views: 182707
Reputation: 14330
OK, here's what we do: open the file, read it line by line, and split it by tabs. Then we grab the second integer and loop through the rest to find the path.
StreamReader reader = File.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null)
{
string[] items = line.Split('\t');
int myInteger = int.Parse(items[1]); // Here's your integer.
// Now let's find the path.
string path = null;
foreach (string item in items)
{
if (item.StartsWith("item\\") && item.EndsWith(".ddj"))
path = item;
}
// At this point, `myInteger` and `path` contain the values we want
// for the current line. We can then store those values or print them,
// or anything else we like.
}
Upvotes: 60
Reputation: 14330
Another solution, this time making use of regular expressions:
using System.Text.RegularExpressions;
...
Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)");
StreamReader reader = FileInfo.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) {
Match match = parts.Match(line);
if (match.Success) {
int number = int.Parse(match.Group(1).Value);
string path = match.Group(2).Value;
// At this point, `number` and `path` contain the values we want
// for the current line. We can then store those values or print them,
// or anything else we like.
}
}
That expression's a little complex, so here it is broken down:
^ Start of string
\d+ "\d" means "digit" - 0-9. The "+" means "one or more."
So this means "one or more digits."
\t This matches a tab.
(\d+) This also matches one or more digits. This time, though, we capture it
using brackets. This means we can access it using the Group method.
\t Another tab.
.+? "." means "anything." So "one or more of anything". In addition, it's lazy.
This is to stop it grabbing everything in sight - it'll only grab as much
as it needs to for the regex to work.
\t Another tab.
(item\\[^\t]+\.ddj)
Here's the meat. This matches: "item\<one or more of anything but a tab>.ddj"
Upvotes: 38
Reputation: 3764
One way that I've found really useful in situations like this is to go old-school and use the Jet OLEDB provider, together with a schema.ini file to read large tab-delimited files in using ADO.Net. Obviously, this method is really only useful if you know the format of the file to be imported.
public void ImportCsvFile(string filename)
{
FileInfo file = new FileInfo(filename);
using (OleDbConnection con =
new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
file.DirectoryName + "\";
Extended Properties='text;HDR=Yes;FMT=TabDelimited';"))
{
using (OleDbCommand cmd = new OleDbCommand(string.Format
("SELECT * FROM [{0}]", file.Name), con))
{
con.Open();
// Using a DataReader to process the data
using (OleDbDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
// Process the current reader entry...
}
}
// Using a DataTable to process the data
using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
{
DataTable tbl = new DataTable("MyTable");
adp.Fill(tbl);
foreach (DataRow row in tbl.Rows)
{
// Process the current row...
}
}
}
}
}
Once you have the data in a nice format like a datatable, filtering out the data you need becomes pretty trivial.
Upvotes: 1
Reputation: 6145
Like it's already mentioned, I would highly recommend using regular expression (in System.Text) to get this kind of job done.
In combo with a solid tool like RegexBuddy, you are looking at handling any complex text record parsing situations, as well as getting results quickly. The tool makes it real easy.
Hope that helps.
Upvotes: 1
Reputation: 134167
You could open the file up and use StreamReader.ReadLine to read the file in line-by-line. Then you can use String.Split to break each line into pieces (use a \t delimiter) to extract the second number.
As the number of items is different you would need to search the string for the pattern 'item\*.ddj'.
To delete an item you could (for example) keep all of the file's contents in memory and write out a new file when the user clicks 'Save'.
Upvotes: 0
Reputation: 2249
Try regular expressions. You can find a certain pattern in your text and replace it with something that you want. I can't give you the exact code right now but you can test out your expressions using this.
http://www.radsoftware.com.au/regexdesigner/
Upvotes: 0
Reputation: 34391
You could do something like:
using (TextReader rdr = OpenYourFile()) {
string line;
while ((line = rdr.ReadLine()) != null) {
string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC
int theInt = Convert.ToInt32(fields[1]);
}
}
The reason you didn't find relevant result when searching for 'formatting' is that the operation you are performing is called 'parsing'.
Upvotes: 6