Reputation: 4216
So I have a CSV that contains a filename and a file's contents. The file is full of newlines and all sorts of other characters. I need to have each row contain the filename in one column and the file contents in the next column, and then I need a new row, until the end of the file.
The data looks like this:
"filename.txt","hey there
buddy
how are you
doing"
"filename2.txt","and so on..."
According to http://creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs, my CSV importer should read those line-breaky values as single values, not new rows. However, I have tried OO Calc, KSpread, Gnumeric, and even Microsoft Excel 2007 and none of them read it that way; they all consider each newline a new row.
Anyone know how to fix this? I've looked through other related questions but none of them seem to say.
If this doesn't work, I'll have to write the file directly with a Python Excel-writing module or something. Anyone know what to do here?
Upvotes: 3
Views: 17857
Reputation: 1
Thank you Alex P! This provides a way better way for me to get text exported from Scrivener into Excel. Here's my variation on you VBA script:
Sub importScriv()
Dim recFields As Variant
Dim rec, rec2 As String
Dim index, row, col, numcols
Dim FileName
FileName = ThisWorkbook.Path & "/scriv.txt"
Open FileName For Input As #1
Line Input #1, rec
' Get next line
Line Input #1, rec2
row = 1
While Not EOF(1)
' The Scrivner compile needs to put a % + Tab character as the first thing in the Section Layout Prefix
' The % is used to demark Scrivener documents
Do While Left(rec2, 1) <> "%"
rec = rec & rec2 & Chr(13)
' Get next line
Line Input #1, rec2
If EOF(1) Then GoTo Finish
Loop
' Split the lines at the tab seprators
recFields = Split(rec, vbTab)
numcols = UBound(recFields) - LBound(recFields) + 1
' put the data in the row
For col = 1 To numcols
' Remove any leading carriage returns
If Left(recFields(col - 1), 1) = CHR(13) Then
recFields(col - 1) = Right(recFields(col - 1), Len(recFields(col - 1)) - 1)
End If
Cells(row, col) = recFields(col - 1)
Next col
' We got a % in rec2 so set rec to rec2
rec = rec2
' increment the row
row = row + 1
' Get next line
Line Input #1, rec2
Wend
Finish: Close #1
' Finally, delete the first column that contains the % document separator characters
Columns(1).EntireColumn.Delete
End Sub
Upvotes: 0
Reputation: 1602
If your .csv file has it's quoted multi-line fields with just chr(10) (0x0A)
to do the break (which seems to be the default for Excel 2007 exported .csv files), not chr(13)+chr(10)
, then standard VBA file processing works fine using line input #1
and split
. The result is you get a single dynamic array element of the field with its newline inside it.
Example for a csv file that might have newlines in field 3 (4):
dim recFields as variant
dim rec as string
open "testfile.csv" for input as #1
line input #1, rec
while not eof(1)
recFields = split(rec,vbTab)
recFields(3) = replace(recFields(3),chr(10),"|")
' May want to remove quotes as well
recFields(3) = replace(recFields(3),chr(34),"")
' Do some stuff here.
' Then read next record
line input #1,rec
wend
close #1
The key is having newlines represented as just 0x0A
in the field, not 0x0C+0x0A
, and using a dynamic (variant) array to split the record. I was dreading having to handle this situation, until I realised under good old VBA, it just worked. The dreaded alternative of reading-ahead, then backwards (eg. using PHP's ftell
and fseek
was) avoided!
Upvotes: 0
Reputation: 1325
The file has to have a .csv extension. Doesn't seem to work otherwise. Bug in Excel as far as I'm concerned.
Upvotes: 2
Reputation: 11223
Well, if the input file may contain anything, it might as well contain quotes, commas, etc., anything that would break the whole table.
I'd suggest one of these:
use different format: XML or HTML table, for example, can be imported to Excel directly
escape newlines in the files and split them in Excel, using an Excel function (or VBasic)
I believe that using CSV is a reasonable option only if you have predictable and simple data in a simple form so you can keep the rule "one line = one row". (And then you can do all the work with printf, which is just cool. :D)
Upvotes: 1
Reputation: 4216
I had to switch to something that did Excel format I/O directly, I did not get Excel to parse newlines from CSV as expected.
Even then it didn't end up working out because Excel has a hard character-per-cell limit. The client decided not to continue to pursue this and worked out an alternate submission process with the developer of the software they needed the Excel file for.
Upvotes: 0
Reputation: 6302
Excel 2007 will read them fine, but you will still have the carriage returns in the column.
You'll need to remove them before importing the file.
If your creating the file yourself from SQL Server you can remove them easily.
REPLACE(REPLACE(Field, CHAR(13),' '), CHAR(10), ' ') AS FixedField
Upvotes: 1
Reputation: 150108
The free LumenWorks CSV reader handles newlines in the data if you set an appropriate flag.
Upvotes: 1
Reputation: 3360
Specify quote (") as text qualifier when importing. Newlines will be ignored then.
Upvotes: 0