webmagnets
webmagnets

Reputation: 2306

Why are these two variables ending up equal?

I am trying to make a 2D array out of words and sentences and then make another 2d array that matches it but with translation to English.

Here is the callback from the Lesson model that happens when I create a new lesson:

before_create do |lesson|
  require 'rmmseg'
  require "to_lang"
  require "bing_translator"

  lesson.parsed_content =[]
  lesson.html_content = []
  RMMSeg::Dictionary.load_dictionaries

  text = lesson.content
  text = text.gsub("。","^^.")
  text = text.gsub("?","~~?")
  text = text.gsub("!", "||!")

  text = text.split(/[.?!]/u) #convert to an array
  text.each do |s|
    s.gsub!("^^","。")
    s.gsub!("~~","?")
    s.gsub!("||","!")
  end

  text.each_with_index do |val, index|
    algor = RMMSeg::Algorithm.new(text[index])
    splittext = []
    loop do
      tok = algor.next_token
      break if tok.nil?
      tex = tok.text.force_encoding('UTF-8')
      splittext << tex
      text[index] = splittext
    end
  end

  lesson.parsed_content = text
  textarray = text
  translator = BingTranslator.new(BING_CLIENT_ID, BING_API_KEY)
  ToLang.start(GOOGLE_TRANSLATE_API)
  textarray.each_with_index do |sentence, si| #iterate array of sentence
    textarray[si] = []
    sentence.each_with_index do |word,wi| #iterate sentence's array of words
      entry = DictionaryEntry.find_by_simplified(word) #returns a DictionaryEntry object hash
      if entry == nil #for cases where there is no DictionaryEntry
        textarray[si] << word
      else
        textarray[si] << entry.definition
      end
    end
    lesson.html_content = textarray
  end
end

Why are my variables lesson.parsed_content and lesson.html_content ending up equal to each other?

I was expecting lesson.parsed_content to be Chinese and lesson.html_content to be English, but they both end up being English. I am probably too tired, but I can't see why lesson.parsed_content ends up English too.

Upvotes: 0

Views: 74

Answers (1)

mu is too short
mu is too short

Reputation: 434945

You're referencing the same array in both of them:

lesson.parsed_content = text
textarray = text
# Various in-place modifications of textarray...
lesson.html_content = textarray

Just doing lesson.parsed_content = text doesn't duplicate text, it just copies the reference so you end up with four things pointing at the same piece of data:

text ------------------=-+--+--+----> [ ... ]
lesson.parsed_content -=-/  |  |
lesson.html_content ---=----/  |
textarray -------------=-------/

Each assignment simply adds another pointer to the same underlying array.

You can't fix this problem with a simple lesson.parsed_content = text.dup because dup only does a shallow copy and that won't duplicate the inner arrays. Since you know that you have an array-of-arrays, you could dup the outer and inner arrays by hand to get a full copy or you could use one of the standard deep copying approaches such as a round trip through Marshal. Or skip the copying altogether, iterate over textarray but modify a separate array.

Upvotes: 4

Related Questions