Reputation: 142
I have an external library whose string representation equivalent to &[char]
.
Some of his edit interfaces accept a range input of type CharRange = Range<usize>
, which means offset based on char
.
On the other hand some other rust libraries I use take type ByteRange = Range<usize>
, which means offset based on u8
.
Currently I am using an O(n)
algorithm, and there is a performance bottleneck here.
Is there any efficient data structure to convert between two?
type CharRange = Range<usize>;
type ByteRange = Range<usize>;
fn byte_range_to_char_range(text: &str, byte_range: ByteRange) -> CharRange {
let start = text[..byte_range.start].chars().count();
let end = text[..byte_range.end].chars().count();
start..end
}
fn char_range_to_byte_range(text: &str, char_range: CharRange) -> ByteRange {
let start = text.char_indices().nth(char_range.start).map(|(i, _)| i).unwrap_or(0);
let end = text.char_indices().nth(char_range.end).map(|(i, _)| i).unwrap_or(text.len());
start..end
}
Upvotes: 0
Views: 366
Reputation: 27187
You can improve it slightly by not iterating from the very start again, but it's probably not worth it unless your texts are very long:
use std::ops::Range;
type CharRange = Range<usize>;
type ByteRange = Range<usize>;
pub fn byte_range_to_char_range(text: &str, byte_range: ByteRange) -> CharRange {
let start = text[..byte_range.start].chars().count();
let size = text[byte_range.start..byte_range.end].chars().count();
start..start + size
}
pub fn char_range_to_byte_range(text: &str, char_range: CharRange) -> ByteRange {
let mut iter = text.char_indices();
let start = iter.nth(char_range.start).map(|(i, _)| i).unwrap_or(0);
let end = iter
.nth(char_range.end - char_range.start - 1)
.map(|(i, _)| i)
.unwrap_or(text.len());
start..end
}
But because utf-8
is quite complex we can't do any better.
Upvotes: 1