Reputation: 672
I am having a dataframe that contains 5 columns while doing data cleaning process i got a problem caused by the carriage return from the text file as shown in the exp below.
Input :
001|Baker St.
London|3|4|7
002|Penny Lane
Liverpool|88|5|7
Output:
001|Baker St. London|3|4|7
002|Penny Lane Liverpool|88|5|7
Any suggestions are welcome.
Upvotes: 0
Views: 1837
Reputation: 494
The built-in strip()
method that string objects provide does this; You can call it like this as you iterate over a line:
cleaned_up_line = line.strip()
As the Python str.strip() docs tell us, it also gets rid of whitespace, newlines, and other special characters - at the beginning and end of a string.
For example:
In [7]: with open('file', 'r') as f:
...: a = f.readlines()
...: print(a)
...:
['the\n', 'file\n\r', 'is\n\r', 'here\n', '\n']
In [8]: with open('file', 'r') as f:
...: a = [line.strip() for line in f.readlines()]
...: print(a)
...:
['the', 'file', 'is', 'here', '']
Upvotes: 1
Reputation: 9197
You can replace the \r
like this:
with open("your.csv", "r") as myfile:
data = myfile.read().replace('\r', '')
Example:
from io import StringIO
# second entry contains a carriage return \r
s = """91|AAA|2010|3
92|BB\rB|2011|4
93|CCC|2012|5
"""
# StringIO simulates a loaded csv file:
# carriage return still there
StringIO(s).read()
# '91|AAA|2010|3\n92|BB\rB|2011|4\n93|CCC|2012|5\n'
# carriage return gone
StringIO(s).read().replace('\r', '')
# '91|AAA|2010|3\n92|BBB|2011|4\n93|CCC|2012|5\n'
With Pandas:
data = StringIO(StringIO(s).read().replace('\r', ''))
pd.read_csv(data, sep='|')
Out[35]:
91 AAA 2010 3
0 92 BBB 2011 4
1 93 CCC 2012 5
Upvotes: 1
Reputation: 16
You could match it with regex and remove it, i.e. re.sub('[\r\n]', '', inputline)
.
Upvotes: 0