Reputation: 3
I have multiple csv files with the same kind of data to be merged on the rows to form a single dataframe but some of the row names have dirty data.
For example '.' at the end of the actual name. I tried the below code -
for file in all_files:
temp = pd.read_csv(file, encoding = "unicode_escape")
temp = temp[['S.No.', 'Item', '2014-15']]
state = lambda x: x.split('-')
temp.rename(columns = {'2014-15':state(file)[1]}, inplace= True)
if file == all_files[0]:
all_states = temp.copy(deep=True)
else:
temp["Item"] = temp["Item"].str.replace("*", "")
all_states = pd.merge(all_states, temp, how = 'outer', on = ['S.No.', 'Item'])
del temp
The output I get is -
S.No. 1 1.1 1.2 1.3 . . .
1.1. --> Need to get rid of these and consider them as 1.1
The contaminated S.No. forms a new row for a single column. I need it to be in the same rows as others.
I just want the last '.' at the end of the string to be removed and not all.
I tried the below to clean S.No. :
temp["S.No."] = temp["S.No."].str.rstrip(".")
temp["S.No."] = temp["S.No."].str.replace(".$", "")
temp["S.No."] = re.sub(r".$", "", str(temp["S.No."]))
But none of them work.
Upvotes: 0
Views: 462
Reputation: 3
the issue was that the '.' at the end had another 4 trailing spaces which went unnoticed.
So after removing those spaces with -
temp["S.No."] = temp["S.No."].str.strip()
All the mentioned methods work. I used -
temp["S.No."] = temp["S.No."].str.rstrip(".")
Upvotes: 0
Reputation: 159
Assuming you are only removing one .
if it exists at the end of the string:
def clean_s_no(text):
return re.sub('\.$', '', text)
Note the \
(escape character) and $
(end of string).
Then, apply the function to all the rows on this column:
temp["S.No."] = temp["S.No."].apply(lambda x: clean_s_no(x), axis=1)
or
temp["S.No."] = temp["S.No."].apply(clean_s_no, axis=1)
Upvotes: 2
Reputation: 68
Assuming that you are column "S.No." is string type then try the following:
temp.loc[temp["S.No."].str.contains('.'), 'S.No.'] = temp["S.No."].str.replace(".","")
You can change the column type to string by
temp["S.No."] = temp["S.No."].astype(str)
Upvotes: 0