ShreyasPara
ShreyasPara

Reputation: 3

Remove last character of specific rows ending with '.'

I have multiple csv files with the same kind of data to be merged on the rows to form a single dataframe but some of the row names have dirty data.

For example '.' at the end of the actual name. I tried the below code -

for file in all_files:
temp = pd.read_csv(file, encoding = "unicode_escape")
temp = temp[['S.No.', 'Item', '2014-15']]
state = lambda x: x.split('-')
temp.rename(columns = {'2014-15':state(file)[1]}, inplace= True)
if file == all_files[0]:
    all_states = temp.copy(deep=True)
else:
    temp["Item"] = temp["Item"].str.replace("*", "")
    all_states = pd.merge(all_states, temp, how = 'outer', on = ['S.No.', 'Item'])
del temp

The output I get is -

S.No. 1 1.1 1.2 1.3 . . .

1.1. --> Need to get rid of these and consider them as 1.1

The contaminated S.No. forms a new row for a single column. I need it to be in the same rows as others.

I just want the last '.' at the end of the string to be removed and not all.

I tried the below to clean S.No. :

temp["S.No."] = temp["S.No."].str.rstrip(".")
temp["S.No."] = temp["S.No."].str.replace(".$", "")
temp["S.No."] = re.sub(r".$", "", str(temp["S.No."]))

But none of them work.

The headers: enter image description here

Dirty Rows: enter image description here

Upvotes: 0

Views: 462

Answers (3)

ShreyasPara
ShreyasPara

Reputation: 3

the issue was that the '.' at the end had another 4 trailing spaces which went unnoticed.

So after removing those spaces with -

temp["S.No."] = temp["S.No."].str.strip()

All the mentioned methods work. I used -

temp["S.No."] = temp["S.No."].str.rstrip(".")

Upvotes: 0

Zac
Zac

Reputation: 159

Assuming you are only removing one . if it exists at the end of the string:

def clean_s_no(text):
    return re.sub('\.$', '', text)

Note the \ (escape character) and $ (end of string).

Then, apply the function to all the rows on this column:

temp["S.No."] = temp["S.No."].apply(lambda x: clean_s_no(x), axis=1)

or

temp["S.No."] = temp["S.No."].apply(clean_s_no, axis=1)

Upvotes: 2

Asad
Asad

Reputation: 68

Assuming that you are column "S.No." is string type then try the following:

temp.loc[temp["S.No."].str.contains('.'), 'S.No.'] = temp["S.No."].str.replace(".","")

You can change the column type to string by

temp["S.No."] = temp["S.No."].astype(str)

Upvotes: 0

Related Questions