Miles Robinson
Miles Robinson

Reputation: 21

Extract first word from a line in a file using Ruby

How do I get the first word from each line? Thanks to help from someone on Stack Overflow, I am working with the code below:

File.open("pastie.rb", "r") do |file|
  while (line = file.gets)
    next if (line[0,1] == " ")
    labwords = line.split.first
    print labwords.join(' ')
  end
end

It extracts the first word from each line, but it has problems with spaces. I need help adjusting it. I need to use the first method, but I don't know how to use it.

Upvotes: 0

Views: 4240

Answers (2)

Jikku Jose
Jikku Jose

Reputation: 18804

Consider this:

def first_words_from_file(file_name)
  lines = File.readlines(file_name).reject(&:empty?)
  lines.map do |line|
    line.split.first
  end
end

puts first_words_from_file('pastie.rb')

Upvotes: 2

Brandon Buck
Brandon Buck

Reputation: 7181

If you want the first word from each line from a file:

first_words = File.read(file_name).lines.map { |l| l.split(/\s+/).first }

It's pretty simple. Let's break it apart:

File.read(file_name)

Reads the entire contents of the file and returns it as a string.

.lines

Splits a string by newline characters (\n) and returns an array of strings. Each string represents a "line."

.map { |l| ... }

Array#map calls the provided block passing in each item and taking the return value of the block to build a new array. Once Array#map finishes it returns the array containing new values. This allows you to transform the values. In the sample block here |l| is the block params portion meaning we're taking one argument and we'll reference it as l.

|l| l.split(/\s+/).first

This is the block internal, I've gone ahead and included the block params here too for completeness. Here we split the line by /\s+/. This is a regular expression, the \s means any whitespace (\t \n and space) and the + following it means one or more so \s+ means one or more whitespace character and of course, it will try to match as many consecutive whitespace characters as possible. Passing this to String#split will return an array of substrings that occur between the seperator given. Now, our separator was one or more whitespace so we should get everything between whitespace. If we had the string "A list of words" we'll get ["A", "list", "of", "words"] after the split call. It's very useful. Finally, we call .first which returns the first element of an array (in this case "the first word").

Now, in Ruby, the evaluated value of the last expression in a block is automatically returned so our first word is returned and given that this block is passed to map we should get an array of the first words from a file. To demonstrate, let's take the input (assuming our file contains):

This is line one
And line two here
Don't forget about line three
Line four is very board
Line five is the best
It all ends with line six

Running this through the line above we get:

["This", "And", "Don't", "Line", "Line", "It"]

Which is the first word from each line.

Upvotes: 5

Related Questions