Reputation: 349
CSV input file:
"18","Agent","To identify^M
","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
"1078","Repeat","Identify
it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M
"621","Com Dot Com","Identify
","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M
In the above input file, I have 3 different type of records.
1) Record No 18 (first 2 lines), even though it should be one line it comes in as 2 lines. The ^M is placed incorrectly at the end of first line.
Expected Output (^M removed from first line and make it one line)
"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
2) Record No 1078 (Line no 3 & 4) - Here i dont have the ^M at the end of line 3. I want to combine Line 3 & 4 and make it one line.
Expected Output
"1078","Repeat","Identify it has ","0164f3eb-beeb-47dd-b9b99b762f430e14","1"^M
3) Record No 621 (Line 4, 5 & 6) - This has ^M only at the end of the line, but it has a blank line inbetween. i want to remove the blankline and make it one line.
Expected Output
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31 8524fcb97a4dadee","1"^M
Upvotes: 1
Views: 896
Reputation: 203209
Using GNU awk for multi-char RS:
$ awk -v RS='^$' -v ORS= 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/\n/,"",$i) }1' file
"18","Agent","To identify^M","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
"1078","Repeat","Identifyit has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M
Since it's not clear if you really have control-Ms or not, I left them as the characters "^M" for now. If you have them just gsub() them out.
Upvotes: 0
Reputation: 15501
This might work:
awk -F \",\" '
/^[[:space:]]*$/ { next }
{
line = line $0
if (split(line, a) == 10) {
print line
line = ""
}
}
' file
I have a feeling there will still be some problems (like missing spaces).
Upvotes: 0
Reputation: 75458
Using Ruby:
ruby -e 'require "csv"; CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") }; puts CSV.generate_line(e, {:force_quotes => true}); }' csv_file
Output:
"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"
"1078","Repeat","Identify it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"
A little more readable form:
ruby -e 'require "csv"
CSV.parse(File.read(ARGV.shift)).each{ |e|
e.map!{ |f|
f.strip.gsub(/[[:space:]]+/, " ")
}
puts CSV.generate_line(e, {:force_quotes => true})
}' csv_file
shopt -u -o histexpand
Script version:
#!/usr/bin/env ruby
require 'csv'
CSV.parse(File.read(ARGV.shift)).each{ |e|
e.map!{ |f|
f.strip.gsub(/[[:space:]]+/, " ")
}
puts CSV.generate_line(e, {:force_quotes => true})
}
Run with
ruby script.rb csv_file
See Ruby-Doc.org for everything.
Upvotes: 1