user3316995
user3316995

Reputation: 100

How many times a specific character appears after each other in a string

I’ve been looking for a way to count how many times a specific character appears after each other in a string. All the ways I found just counts how many times character “A” appears in the string in total.

Example of string:
0xAAABBC0123456789AABBCCDD0123456789ABCDEF

Each string is 43 characters long and starts with “0x”. Each string only contains the following characters in random order: 0-9 and A-F, (total amount of 16 different characters). Each character can appear after each other in a row several times, example: “AAA” or "111".

I’m interested in how many times each of the 16 characters maximum appears after each other in one string, and check this through all my strings.

So far I’ve only come up with this Powershell script that counts how many times each character appears per line:

Get-Content " C:\Temp\strings.txt" | ForEach-Object{
    New-Object PSObject -Property @{
        Strings = $_
        Row = $_.ReadCount
        9 = [regex]::matches($_,"9").count
        D = [regex]::matches($_,"D").count
        B = [regex]::matches($_,"B").count
        C = [regex]::matches($_,"C").count
        7 = [regex]::matches($_,"7").count
        3 = [regex]::matches($_,"3").count
        1 = [regex]::matches($_,"1").count
        8 = [regex]::matches($_,"8").count
        F = [regex]::matches($_,"F").count
        2 = [regex]::matches($_,"2").count
        4 = [regex]::matches($_,"4").count
        E = [regex]::matches($_,"E").count
        6 = [regex]::matches($_,"6").count
        5 = [regex]::matches($_,"5").count
        A = [regex]::matches($_,"A").count
        0 = [regex]::matches($_,"0").count
    }
} | Sort Count -Descending | Export-Csv -Path "C:\Temp\output.csv" –NoTypeInformation

I would preferably do this in Powershell, but if there’s another way of doing this more easily, please let me know.

Upvotes: 0

Views: 3131

Answers (5)

Tim
Tim

Reputation: 1

Try this.

$out=@()
$string="0xAAABBC0123456789AABBCCDD0123456789ABCDEF"
$out+="Character,Count"
$out+='0123456789ABCDEF'.ToCharArray()|%{"$_," + ($string.split("$_")|Where-object{$_ -eq ""}).count}
ConvertFrom-Csv $out |sort count -Descending 

This yields the following:

 Character Count
 --------- -----
 A         3    
 B         2    
 0         1    
 C         1    
 D         1    
 F         1    
 1         0    
 2         0    
 3         0    
 4         0    
 5         0    
 6         0    
 7         0    
 8         0    
 9         0    
 E         0    

You can put it into a function like this:

function count_dups ($string){
   $out=@() # null array
   $out+="Character,Count" # header
   $out+='0123456789ABCDEF'.ToCharArray()|%{"$_," + ($string.split("$_")|Where-object{$_ -eq ""}).count}
   return ConvertFrom-Csv $out | sort count -Descending
} 

The biggest part of what I'm doing here is this line.

'0123456789ABCDEF'.ToCharArray()|%{"$_," + (string.split("$_")|Where-object{$_ -eq ""}).count}

I am splitting the string into an array on the the characters fed in from the character array '0123456789ABCDEF'. Then I am counting the empty elements in the array.

I'm only creating the array $out so that the output can be formatted like your example.

Upvotes: 0

user3316995
user3316995

Reputation: 100

The result came out this way, even though it gives me 15 extra rows per string, I can easily filter unwanted material out in Microsoft Excel.

#Removed all "0x" in textfile before running this script
$strings = Get-Content " C:\Temp\strings_without_0x.txt"
foreach($s in $strings) {
$repeats = $s.Remove(0, 2) -split '(?<=(.))(?!\1|$)'

$groups = $repeats |Group-Object {$_[0]} -AsHashTable

'0123456789ABCDEF'.ToCharArray() |%{
    [pscustomobject]@{
        String = "$s"
        Character = "$_"
        MaxLength = "$($groups[$_] |Sort Length -Descending |Select -First 1)".Length
    }

} | Sort Count -Descending | Export-Csv -Path "C:\Temp\output.csv" -NoTypeInformation -Append}

Thank you for all great answers!

Upvotes: 0

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174545

You could use a lookbehind and a backreference to split the string into repeating groups:

$s = '0xAAABBC0123456789AABBCCDD0123456789ABCDEF'
$repeats = $s.Remove(0, 2) -split '(?<=(.))(?!\1|$)'

Now we can group the substring based on the first letter of each:

$groups = $repeats |Group-Object {$_[0]} -AsHashTable

And finally grab the longest sequence of each character:

'0123456789ABCDEF'.ToCharArray() |%{
    [pscustomobject]@{
        Character = "$_"
        MaxLength = "$($groups[$_] |Sort Length -Descending |Select -First 1)".Length
    }
}

And you should end up with a list (for your example) like this:

Character MaxLength
--------- ---------
0                 1
1                 1
2                 1
3                 1
4                 1
5                 1
6                 1
7                 1
8                 1
9                 1
A                 3
B                 2
C                 2
D                 2
E                 1
F                 1

Upvotes: 1

user6811411
user6811411

Reputation:

Build a HexPair iterating the string position for position (omitting the last) and increment a value in a hash table with the HexPair as the key.

$String = '0xAAABBC0123456789AABBCCDD0123456789ABCDEF'
$Hash=@{}
for ($i=2;$i -le ($string.length-2);$i++){
    $Hash[$($String.Substring($i,2))]+=1
}
$Hash.GetEnumerator()|ForEach-Object{
   [PSCustomObject]@{HexPair = $_.Name
                     Count = $_.Value}
} |Sort Count -Descending

Sample output

HexPair Count
------- -----
BC          3
AB          3
AA          3
CD          2
BB          2
9A          2
89          2
78          2
67          2
56          2
45          2
34          2
23          2
12          2
01          2
EF          1
DE          1
DD          1
D0          1
CC          1
C0          1

Alternative output:

$Hash.GetEnumerator()|ForEach-Object{
    [PSCustomObject]@{HexPair = $_.Name
                      Count = $_.Value}
 } |Sort HexPair|group Count |%{"Count {0} {1}" -f $_.Name,($_.Group.HexPair -Join(', '))}|Sort

Count 1 C0, CC, D0, DD, DE, EF
Count 2 01, 12, 23, 34, 45, 56, 67, 78, 89, 9A, BB, CD
Count 3 AA, AB, BC

Upvotes: 0

vonPryz
vonPryz

Reputation: 24071

One approach is to iterate the source string character by character and keep track on how many times the character has been seen. This is easily done with a hash table. Like so,

# Hashtable initialization. Add keys for 0-9A-F:
# Each char has initial count 0
$ht = @{}
"ABCDEF0123456789".ToCharArray() | % {
    $ht.Add($($_.ToString()), 0)
}

# Test data, the 0x prefix will contain one extra zero
$s = "0xAAABBC0123456789AABBCCDD0123456789ABCDEF"    

# Convert data to char array for iteration
# Increment value in hashtable by using the char as key
$s.ToCharArray() | % { $ht[$_.ToString()]+=1 }

# Check results
PS C:\> $ht

Name                           Value
----                           -----
B                              5
3                              2
5                              2
x                              1
9                              2
2                              2
8                              2
0                              3
1                              2
E                              1
7                              2
F                              1
6                              2
4                              2
D                              3
A                              6
C                              4

Upvotes: 0

Related Questions