Reputation: 153
I recently received a tab separated file that has 60 fields. Each field can have any character in it. The export I received also has linefeeds and carriage returns in some of the fields. This is causing the tab separated file to not import correctly. Is there a way to remove linebreaks and carriage returns if the line does not have 59 tabs on it? There may or may not be data between each tab.
Line 3,4,5 is the issue I'm trying to fix.
Upvotes: 0
Views: 150
Reputation: 91498
I'd do:
Find what: ^((?:[^\t]*\t[^\t]*){1,58})[\r\n]+
Replace with: $1
This will replace line break with nothing if there are less than 59 occurrence of \t
character in a line.
Upvotes: 0
Reputation: 1211
Warning: I'm assuming that there are no tabs within a column's data. If there is, then you need something far more capable that what I have here.
The following works with the sample input provided:
First, replace all of the line breaks with a character that doesn't occur anywhere in your file. You can even use characters that you can't type with your keyboard.
Find what: (\r\n?|\n)
Replace with: \xB6
Then, match your 60-field rows and give them line-breaks (I'm going with Windows-style):
Find what: ^(([^\t]*\t){59}[^\t\xB6]*)\xB6
Replace with: $1\r\n
I'm making one huge assumption here: that column 60 never contains a line break. If this is false, then you're going to have some of column 60's data ending up in column 1 of the next record.
Now, if you don't like that paragraph symbol showing up in your data, you can either purge it or replace it with whatever you like:
Find what: \xB6
Replace with:
Explanation of matching patterns:
(\r\n?|\n)
matches any of the three kinds of line breaks, which are single \r
, a single \n
, or the Windows-style \r\n
. Wikipedia has a whole article about this.
See http://regex101.com/r/iB6fK9 to explore the ^(([^\t]*\t){59}[^\t\xB6]*)\xB6
pattern.
^
at the start.([^\t]*\t){59}
. That gets us the first 59 tab-separated columns. Only column 59 is captured by this group.[^\t\xB6]*
.\r\n
that we insert with the $1\r\n
replacement.Upvotes: 1
Reputation: 31045
What I understand from your question is that you want to remove the windows \r\n
from your file, to do this you can use replace dialog ctrl+h
.
On the Search Mode
select Extended (\n, \r,...
, then on the "Find What" look for \r\n
and in "Replace" leave it empty (or replace it with what you want).
Upvotes: 0