Reputation: 319
I have a Dataframe with 3 columns:
id,name,team
101,kevin, marketing
102,scott,admin\n
103,peter,finance\n
I am trying to apply a regex function such that I remove the unnecessary spaces. I have got the code that removes these spaces how ever I am unable loop it through the entire Dataframe.
This is what I have tried thus far:
df['team'] = re.sub(r'[\n\r]*','',df['team'])
But this throws an error AttributeError: 'Series' object has no attribute 're'
Could anyone advice how could I loop this regex through the entire Dataframe df['team']
column
Upvotes: 10
Views: 59778
Reputation: 2948
Here's a powerful technique to replace multiple words in a pandas column in one step without loops. In my code I wanted to eliminate things like 'CORPORATION', 'LLC' etc. (all of them is in the RemoveDB.csv file) from my column without using a loop. In this scenario I'm removing 40 words from the entire column in one step.
RemoveDB = pd.read_csv('RemoveDBcsv')
RemoveDB = RemoveDB['REMOVE'].tolist()
RemoveDB = '|'.join(RemoveDB)
pattern = re.compile(RemoveDB)
df['NAME']= df['NAME'].str.replace(pattern,'', regex = True)
Upvotes: 1
Reputation: 339
As long it's a dataframe check replace https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html
df['team'].replace( { r"[\n\r]+" : '' }, inplace= True, regex = True)
Regarding the regex, '*' means 0 or more, you should need '+' which is 1 or more
Upvotes: 13
Reputation: 95
Another example (but without regex) but maybe still usefull for someone.
id = pd.Series(['101','102','103'])
name = pd.Series(['kevin','scott','peter'])
team = pd.Series([' marketing','admin\n', 'finance\n'])
testsO = pd.DataFrame({'id': id, 'name': name, 'team': team})
print(testsO)
testsO['team'] = testsO['team'].str.strip()
print(testsO)
Upvotes: 0
Reputation: 21759
You are almost there, there are two simple ways of doing this:
# option 1 - faster way
df['team'] = [re.sub(r'[\n\r]*','', str(x)) for x in df['team']]
# option 2
df['team'] = df['team'].apply(lambda x: re.sub(r'[\n\r]*','', str(x)))
Upvotes: 23