user3072054
user3072054

Reputation: 349

Remove spaces inbetween fields in a CSV file in UNIX

CSV input file:

"18","Agent","To identify^M
","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
"1078","Repeat","Identify
it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M
"621","Com Dot Com","Identify

","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M

In the above input file, I have 3 different type of records.

1) Record No 18 (first 2 lines), even though it should be one line it comes in as 2 lines. The ^M is placed incorrectly at the end of first line.

Expected Output (^M removed from first line and make it one line)

"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M

2) Record No 1078 (Line no 3 & 4) - Here i dont have the ^M at the end of line 3. I want to combine Line 3 & 4 and make it one line.

Expected Output

"1078","Repeat","Identify it has ","0164f3eb-beeb-47dd-b9b99b762f430e14","1"^M

3) Record No 621 (Line 4, 5 & 6) - This has ^M only at the end of the line, but it has a blank line inbetween. i want to remove the blankline and make it one line.

Expected Output

"621","Com Dot Com","Identify","7fc9e73e-3470-4b31 8524fcb97a4dadee","1"^M

Upvotes: 1

Views: 896

Answers (3)

Ed Morton
Ed Morton

Reputation: 203209

Using GNU awk for multi-char RS:

$ awk -v RS='^$' -v ORS= 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/\n/,"",$i) }1' file
"18","Agent","To identify^M","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
"1078","Repeat","Identifyit has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M

Since it's not clear if you really have control-Ms or not, I left them as the characters "^M" for now. If you have them just gsub() them out.

Upvotes: 0

ooga
ooga

Reputation: 15501

This might work:

awk -F \",\" '
  /^[[:space:]]*$/ { next }
  {
    line = line $0
    if (split(line, a) == 10) {
      print line
      line = ""
    }
  }
' file

I have a feeling there will still be some problems (like missing spaces).

Upvotes: 0

konsolebox
konsolebox

Reputation: 75458

Using Ruby:

ruby -e 'require "csv"; CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") }; puts CSV.generate_line(e, {:force_quotes => true}); }' csv_file

Output:

"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"
"1078","Repeat","Identify it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"

A little more readable form:

ruby -e 'require "csv"
    CSV.parse(File.read(ARGV.shift)).each{ |e|
        e.map!{ |f|
            f.strip.gsub(/[[:space:]]+/, " ")
        }
        puts CSV.generate_line(e, {:force_quotes => true})
    }' csv_file
  • Bash's history expansion may affect the command, so just you can disable it if you want: shopt -u -o histexpand

Script version:

#!/usr/bin/env ruby
require 'csv'
CSV.parse(File.read(ARGV.shift)).each{ |e|
  e.map!{ |f|
    f.strip.gsub(/[[:space:]]+/, " ")
  }
  puts CSV.generate_line(e, {:force_quotes => true})
}

Run with

ruby script.rb csv_file

See Ruby-Doc.org for everything.

Upvotes: 1

Related Questions