Reputation: 756
I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r'
attached. Once I complete cleaning process, I store the resulting dataframe in destination.csv file. Since the above particular row has \r
attached, it always creates a new row.
id 29
location Uttar Pradesh\r
country_name India
total_deaths 20
I want to remove \r
. I tried df.replace({'\r': ''}, regex=True)
. It isn't working for me.
Is there any other solution. Can somebody help?
In the above process, I am iterating over df to see if \r
is present. If present, then need to replace. Here row.replace()
or row.str.strip()
doesn't seem to be working or I could be doing it in a wrong way.
I don't want specify the column name or row number while using replace()
. Because I can't be certain that only 'location' column will be having \r
. Please find the code below.
count = 0
for row_index, row in df.iterrows():
if re.search(r"\\r", str(row)):
print type(row) #Return type is pandas.Series
row.replace({r'\\r': ''} , regex=True)
print row
count += 1
Upvotes: 12
Views: 42121
Reputation: 1932
Somehow, the accepted answer did not work for me. Ultimately, I found the solution by doing it like followed
df["29"] = df["29"].replace(r'\r', '', regex=True)
The difference is that I use \r
instead of \\r
.
Upvotes: 1
Reputation: 21
Just make df equal to the df.replace code line and then print df.
df=df.replace({'\r': ''}, regex=True)
print(df)
Upvotes: 1
Reputation: 1019
The below code removes \n tab spaces, \n new line and \r carriage return and is great for condensing datum into one row. The answer was taken from https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a
df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)
Upvotes: 3
Reputation: 862471
Another solution is use str.strip
:
df['29'] = df['29'].str.strip(r'\\r')
print df
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
If you want use replace
, add r
and one \
:
print df.replace({r'\\r': ''}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
In replace
you can define column for replacing like:
print df
id 29
0 location Uttar Pradesh\r
1 country_name India
2 total_deaths\r 20
print df.replace({'29': {r'\\r': ''}}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths\r 20
print df.replace({r'\\r': ''}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
EDIT by comment:
import pandas as pd
df = pd.read_csv('data_source_test.csv')
print df
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh\r\n 20
9 10 India Orissa 69
print df.replace({r'\r\n': ''}, regex=True)
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh 20
9 10 India Orissa 69
If need replace only in column location
:
df['location'] = df.location.str.replace(r'\r\n', '')
print df
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh 20
9 10 India Orissa 69
Upvotes: 19
Reputation: 393933
use str.replace
, you need to escape the sequence so it treats it as a carriage return rather than the literal \r
:
In [15]:
df['29'] = df['29'].str.replace(r'\\r','')
df
Out[15]:
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
Upvotes: 3