Reputation: 13626
I'm trying to convert a word to an array of one-hot encoded arrays using a simple vocabulary. The dictionary I've constructed is keyed off of characters.
vocab = "abc"
char_id = Dict([ (index, char) for (char, index) in enumerate(vocab) ])
# Dict{Char,Int64} with 3 entries:
# 'a' => 1
# 'c' => 3
# 'b' => 2
function char_to_one_hot(char, char_id, max_length)
one_hot = zeros(max_length)
setindex!(one_hot, 1.0, char_id[char])
end
function word_to_one_hot(word, char_id, max_length)
map((char) -> char_to_one_hot(char, char_id, max_length), split(word, ""))
end
word_to_one_hot(word, char_id, max_length)
Unfortunately, this returns an error because the char_id
Dict is uses char keys instead of strings. How can I convert either the dictionary to use string values as keys, or chars to strings so the comparison matches?
ERROR: KeyError: key "a" not found
Stacktrace:
[1] getindex at ./dict.jl:467 [inlined]
[2] char_to_one_hot(::SubString{String}, ::Dict{Char,Int64}, ::Int64) at ./REPL[456]:3
[3] #78 at ./REPL[457]:2 [inlined]
[4] iterate at ./generator.jl:47 [inlined]
[5] _collect(::Array{SubString{String},1}, ::Base.Generator{Array{SubString{String},1},var"#78#79"{Dict{Char,Int64},Int64}}, ::Base.EltypeUnknown, ::Base.HasShape{1}) at ./array.jl:699
[6] collect_similar at ./array.jl:628 [inlined]
[7] map at ./abstractarray.jl:2162 [inlined]
[8] word_to_one_hot(::String, ::Dict{Char,Int64}, ::Int64) at ./REPL[457]:2
[9] top-level scope at REPL[458]:1
Upvotes: 4
Views: 561
Reputation: 6086
To convert length 1 string to char, reference the string's first char with [1]. To convert char to string, use string().
julia> s = "c"
"c"
julia> s[1]
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
julia> string(s)
"c"
Upvotes: 2
Reputation: 20248
A string can already be seen as a collection of characters, so you shouldn't need to split
the word.
However, map
is specialized in such a way that on strings you can only map functions which return chars. And strings are also treated as scalars by the broadcasting system. This leaves us with a few options: a simple for
loop or maybe a generator/comprehension.
I think in this case I'd go with the comprehension:
function char_to_one_hot(char, char_id, max_length)
one_hot = zeros(max_length)
setindex!(one_hot, 1.0, char_id[char])
end
function word_to_one_hot(word, char_id, max_length)
[char_to_one_hot(char, char_id, max_length) for char in word]
end
which I think gives what you'd expect:
julia> vocab = "abc"
"abc"
julia> char_id = Dict([ (index, char) for (char, index) in enumerate(vocab) ])
Dict{Char,Int64} with 3 entries:
'a' => 1
'c' => 3
'b' => 2
julia> word_to_one_hot("acb", char_id, 5)
3-element Array{Array{Float64,1},1}:
[1.0, 0.0, 0.0, 0.0, 0.0]
[0.0, 0.0, 1.0, 0.0, 0.0]
[0.0, 1.0, 0.0, 0.0, 0.0]
If you still want to convert between 1-character strings and characters, you can do it this way:
julia> str="a"; first(str)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> chr='a'; string(chr)
"a"
Upvotes: 4