blackhawk11
blackhawk11

Reputation: 11

Remove numbers from a string in a text file

I want to remove the "H" and the numbers after it. I only want the "B". I know how to remove the "H" but I'm not sure how remove the number after the H". The number after the H" could vary from one digit to three digits.

H1 B
H2 B
H10 B
H11 B

I was trying this. It works if the number after "H" is a single digit. It won't work if the number after "H" has more than a single digit.

If line.Contains("H") Then
    line = line.Remove(0, 2)
End If

' ...

Dim AllFiles As String() = IO.Directory.GetFiles("C:\test")
For Each File As String In AllFiles
    Dim newfile As New List(Of String)
    For Each line As String In System.IO.File.ReadAllLines(File)
        If line.Contains("H") Then
            line = line.Remove(0, 2)
        End If
        newfile.Add(line)

Upvotes: 1

Views: 4459

Answers (3)

dummy
dummy

Reputation: 4284

enter image description here

Regular Expression would do the trick:

Imports System.Text.RegularExpressions

Module Module1

    Sub Main()
        Dim input = IO.File.ReadAllText("input.txt")
        Dim output = Regex.Replace(input, "H\d+", "")
        IO.File.WriteAllText("output.txt", output)
    End Sub

End Module

The magic part is "H\d+", which translates to "the letter H", followed by a digit ("\d"), repeated at least one time.

RegularExpressions are kind of tricky to get used to. But fortunately there is tons of documentation and examples on the web. Just google it :)

Edit: As Steven Doggart correctly notes:

  1. If you would like to get rid of the space after the number, change the expression to "H\d+ ".

  2. If you only want to match/replace it at beginning of each line change it to "^H\d+".

Upvotes: 6

Steven Doggart
Steven Doggart

Reputation: 43743

You could use the Char.IsDigit method to loop through the characters in the string and find the position of the first non-numeric character, or you could look for the first space, but it would be much simpler (and more flexible) to use Regular Expressions. For instance:

Dim match As Match = Regex.Match(line, "^H\d+ (.*)")
If match.Success Then
    Dim value As String = match.Groups(1).Value
End If

Here is the meaning of the regular expression:

  • ^ - The matching string must begin at the beginning of the line
  • H - The matching string must begin with the letter "H"
  • \d - The matching string must then contain a numeric (digit) character
  • + - There will be one or more numeric characters
  • [space] - There must be a space between the digits and what comes next
  • (...) - The parenthesis create a group so that, in code, we can query the value of just the characters within the group
  • . - Any character
  • * - Any number of times

The match.Groups(1) property returns the first group (the part between the parenthesis), which is the value of the text that comes after the space.

Admittedly, regular expressions do have a fairly high learning curve, but they are definitely worth learning. The biggest advantage of regular expressions is that they are very flexible. For instance, rather than hard-coding that logic in your application, you could store that regular expression externally in a setting or database. Then you could modify it without recompiling your application. More importantly, you could customize it, as necessary, for each installation of your application.

RegEx is used in many different languages, tools, and technologies. For instance, you can use is in Visual Studio to perform advanced Find/Replace on your source code, which is, alone, almost worth the time to learn it.

Upvotes: 1

rory.ap
rory.ap

Reputation: 35270

Assuming the "H" and the number are always at the beginning of the line and followed by a single space and then the "B" (and whatever else you want to keep after that), you can do something like this:

line = line.Substring(line.IndexOf(" "c) + 1)

Upvotes: 1

Related Questions