Reputation: 1091
I'm a Java developer and I'm using Ubuntu to develop. The project was created in Windows with Eclipse and it's using the Windows-1252 encoding.
To convert to UTF-8 I've used the recode program:
find Web -iname \*.java | xargs recode CP1252...UTF-8
This command gives this error:
recode: Web/src/br/cits/projeto/geral/presentation/GravacaoMessageHelper.java failed: Ambiguous output in step `CR-LF..data
I've searched about it and get the solution in Bash and Windows, Recode: Ambiguous output in step `data..CR-LF' and it says:
Convert line endings from CR/LF to a single LF: Edit the file with Vim, give the command
:set ff=unix
and save the file. Recode now should run without errors.
Nice, but I've many files to remove the CR/LF character from, and I can't open each to do it. Vi doesn't provide any option to command line for Bash operations.
Can sed be used to do this? How?
Upvotes: 109
Views: 189749
Reputation: 311
On IntelliJ (or most other IDEs), it can be changed by going into the main menu > File > File Properties > Line Separators, and selecting a line-ending style from the list.
Upvotes: 1
Reputation: 1104
use the command bellow to convert the line endings of a file to Unix format using sed:
sed -i 's/\r$//' file_name.sh
This command will replace all carriage return (CR) characters at the end of lines with nothing.
Upvotes: 1
Reputation: 664
In order to overcome
Ambiguous output in step `CR-LF..data'
the simple solution might be to add the -f
flag to force the conversion.
Upvotes: 6
Reputation: 640
I'll take a little exception to jichao's answer. You can actually do everything he just talked about fairly easily. Instead of looking for a \n
, just look for carriage return at the end of the line.
sed -i 's/\r$//' "${FILE_NAME}"
To change from Unix back to DOS, simply look for the last character on the line and add a form feed to it. (I'll add -r
to make this easier with grep regular expressions.)
sed -ri 's/(.)$/\1\r/' "${FILE_NAME}"
Theoretically, the file could be changed to Mac style by adding code to the last example that also appends the next line of input to the first line until all lines have been processed. I won't try to make that example here, though.
Warning: -i
changes the actual file. If you want a backup to be made, add a string of characters after -i
. This will move the existing file to a file with the same name with your characters added to the end.
Update: The Unix to DOS conversion can be simplified and made more efficient by not bothering to look for the last character. This also allows us to not require using -r for it to work:
sed -i 's/$/\r/' "${FILE_NAME}"
Upvotes: 13
Reputation: 767
Actually, Vim does allow what you're looking for. Enter Vim, and type the following commands:
:args **/*.java
:argdo set ff=unix | update | next
The first of these commands sets the argument list to every file matching **/*.java
, which is all Java files, recursively. The second of these commands does the following to each file in the argument list, in turn:
Upvotes: 18
Reputation: 1831
sed cannot match \n
because the trailing newline is removed before the line is put into the pattern space, but it can match \r
, so you can convert \r\n
(DOS) to \n
(Unix) by removing \r:
sed -i 's/\r//g' file
Warning: this will change the original file
However, you cannot change from Unix EOL to DOS or old Mac (\r
) by this. More readings here:
How can I replace a newline (\n) using sed?
Upvotes: 115
Reputation: 24377
Try the Python script by Bryan Maupin found here (I've modified it a little bit to be more generic):
#!/usr/bin/env python
import sys
input_file_name = sys.argv[1]
output_file_name = sys.argv[2]
input_file = open(input_file_name)
output_file = open(output_file_name, 'w')
line_number = 0
for input_line in input_file:
line_number += 1
try: # first try to decode it using cp1252 (Windows, Western Europe)
output_line = input_line.decode('cp1252').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
try: # then if that fails, try to decode using latin1 (ISO 8859-1)
output_line = input_line.decode('latin1').encode('utf8')
except UnicodeDecodeError, error: # if there's an error
sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr
sys.exit(1) # and just keep going
output_file.write(output_line)
input_file.close()
output_file.close()
You can use that script with
$ ./cp1252_utf8.py file_cp1252.sql file_utf8.sql
Upvotes: 0
Reputation: 1150
The tr command can also do this:
tr -d '\15\32' < winfile.txt > unixfile.txt
and should be available to you.
You'll need to run tr from within a script, since it cannot work with file names. For example, create a file myscript.sh:
#!/bin/bash
for f in `find -iname \*.java`; do
echo "$f"
tr -d '\15\32' < "$f" > "$f.tr"
mv "$f.tr" "$f"
recode CP1252...UTF-8 "$f"
done
Running myscript.sh
would process all the java files in the current directory and its subdirectories.
Upvotes: 9
Reputation: 86525
There should be a program called dos2unix
that will fix line endings for you. If it's not already on your Linux box, it should be available via the package manager.
Upvotes: 148
Reputation: 13624
Go back to Windows, tell Eclipse to change the encoding to UTF-8, then back to Unix and run d2u
on the files.
Upvotes: -1