Reputation: 100
I’ve been looking for a way to count how many times a specific character appears after each other in a string. All the ways I found just counts how many times character “A” appears in the string in total.
Example of string:
0xAAABBC0123456789AABBCCDD0123456789ABCDEF
Each string is 43 characters long and starts with “0x”. Each string only contains the following characters in random order: 0-9 and A-F, (total amount of 16 different characters). Each character can appear after each other in a row several times, example: “AAA” or "111".
I’m interested in how many times each of the 16 characters maximum appears after each other in one string, and check this through all my strings.
So far I’ve only come up with this Powershell script that counts how many times each character appears per line:
Get-Content " C:\Temp\strings.txt" | ForEach-Object{
New-Object PSObject -Property @{
Strings = $_
Row = $_.ReadCount
9 = [regex]::matches($_,"9").count
D = [regex]::matches($_,"D").count
B = [regex]::matches($_,"B").count
C = [regex]::matches($_,"C").count
7 = [regex]::matches($_,"7").count
3 = [regex]::matches($_,"3").count
1 = [regex]::matches($_,"1").count
8 = [regex]::matches($_,"8").count
F = [regex]::matches($_,"F").count
2 = [regex]::matches($_,"2").count
4 = [regex]::matches($_,"4").count
E = [regex]::matches($_,"E").count
6 = [regex]::matches($_,"6").count
5 = [regex]::matches($_,"5").count
A = [regex]::matches($_,"A").count
0 = [regex]::matches($_,"0").count
}
} | Sort Count -Descending | Export-Csv -Path "C:\Temp\output.csv" –NoTypeInformation
I would preferably do this in Powershell, but if there’s another way of doing this more easily, please let me know.
Upvotes: 0
Views: 3131
Reputation: 1
Try this.
$out=@()
$string="0xAAABBC0123456789AABBCCDD0123456789ABCDEF"
$out+="Character,Count"
$out+='0123456789ABCDEF'.ToCharArray()|%{"$_," + ($string.split("$_")|Where-object{$_ -eq ""}).count}
ConvertFrom-Csv $out |sort count -Descending
This yields the following:
Character Count
--------- -----
A 3
B 2
0 1
C 1
D 1
F 1
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
E 0
You can put it into a function like this:
function count_dups ($string){
$out=@() # null array
$out+="Character,Count" # header
$out+='0123456789ABCDEF'.ToCharArray()|%{"$_," + ($string.split("$_")|Where-object{$_ -eq ""}).count}
return ConvertFrom-Csv $out | sort count -Descending
}
The biggest part of what I'm doing here is this line.
'0123456789ABCDEF'.ToCharArray()|%{"$_," + (string.split("$_")|Where-object{$_ -eq ""}).count}
I am splitting the string into an array on the the characters fed in from the character array '0123456789ABCDEF'. Then I am counting the empty elements in the array.
I'm only creating the array $out so that the output can be formatted like your example.
Upvotes: 0
Reputation: 100
The result came out this way, even though it gives me 15 extra rows per string, I can easily filter unwanted material out in Microsoft Excel.
#Removed all "0x" in textfile before running this script
$strings = Get-Content " C:\Temp\strings_without_0x.txt"
foreach($s in $strings) {
$repeats = $s.Remove(0, 2) -split '(?<=(.))(?!\1|$)'
$groups = $repeats |Group-Object {$_[0]} -AsHashTable
'0123456789ABCDEF'.ToCharArray() |%{
[pscustomobject]@{
String = "$s"
Character = "$_"
MaxLength = "$($groups[$_] |Sort Length -Descending |Select -First 1)".Length
}
} | Sort Count -Descending | Export-Csv -Path "C:\Temp\output.csv" -NoTypeInformation -Append}
Thank you for all great answers!
Upvotes: 0
Reputation: 174545
You could use a lookbehind and a backreference to split the string into repeating groups:
$s = '0xAAABBC0123456789AABBCCDD0123456789ABCDEF'
$repeats = $s.Remove(0, 2) -split '(?<=(.))(?!\1|$)'
Now we can group the substring based on the first letter of each:
$groups = $repeats |Group-Object {$_[0]} -AsHashTable
And finally grab the longest sequence of each character:
'0123456789ABCDEF'.ToCharArray() |%{
[pscustomobject]@{
Character = "$_"
MaxLength = "$($groups[$_] |Sort Length -Descending |Select -First 1)".Length
}
}
And you should end up with a list (for your example) like this:
Character MaxLength
--------- ---------
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
A 3
B 2
C 2
D 2
E 1
F 1
Upvotes: 1
Reputation:
Build a HexPair iterating the string position for position (omitting the last) and increment a value in a hash table with the HexPair as the key.
$String = '0xAAABBC0123456789AABBCCDD0123456789ABCDEF'
$Hash=@{}
for ($i=2;$i -le ($string.length-2);$i++){
$Hash[$($String.Substring($i,2))]+=1
}
$Hash.GetEnumerator()|ForEach-Object{
[PSCustomObject]@{HexPair = $_.Name
Count = $_.Value}
} |Sort Count -Descending
Sample output
HexPair Count
------- -----
BC 3
AB 3
AA 3
CD 2
BB 2
9A 2
89 2
78 2
67 2
56 2
45 2
34 2
23 2
12 2
01 2
EF 1
DE 1
DD 1
D0 1
CC 1
C0 1
Alternative output:
$Hash.GetEnumerator()|ForEach-Object{
[PSCustomObject]@{HexPair = $_.Name
Count = $_.Value}
} |Sort HexPair|group Count |%{"Count {0} {1}" -f $_.Name,($_.Group.HexPair -Join(', '))}|Sort
Count 1 C0, CC, D0, DD, DE, EF
Count 2 01, 12, 23, 34, 45, 56, 67, 78, 89, 9A, BB, CD
Count 3 AA, AB, BC
Upvotes: 0
Reputation: 24071
One approach is to iterate the source string character by character and keep track on how many times the character has been seen. This is easily done with a hash table. Like so,
# Hashtable initialization. Add keys for 0-9A-F:
# Each char has initial count 0
$ht = @{}
"ABCDEF0123456789".ToCharArray() | % {
$ht.Add($($_.ToString()), 0)
}
# Test data, the 0x prefix will contain one extra zero
$s = "0xAAABBC0123456789AABBCCDD0123456789ABCDEF"
# Convert data to char array for iteration
# Increment value in hashtable by using the char as key
$s.ToCharArray() | % { $ht[$_.ToString()]+=1 }
# Check results
PS C:\> $ht
Name Value
---- -----
B 5
3 2
5 2
x 1
9 2
2 2
8 2
0 3
1 2
E 1
7 2
F 1
6 2
4 2
D 3
A 6
C 4
Upvotes: 0