Reputation: 865
I'm encountering an error when trying to iterate over a series in a pandas dataframe which contains freetext. The text is contained in df[1]
.
import pandas as pd
corpus = []
for i in range(0, 1000):
review = df[1][i]
The error raised is on the last row of code.
except KeyError as e1: if len(self) > 0 and self.inferred_type in ['integer', 'boolean']: ... KeyError: 100
Despite searching I can't work out what the error message means.
Edit I realised that the error was not being caused by the regex, so have taken all reference to regex out of the question. The error remains the same with the code as shown above.
Upvotes: 1
Views: 68
Reputation: 38415
Using loop is considered the least optimum option in Pandas. Please look into df.replace().
Consider this dataframe,
df = pd.DataFrame({'col': ['sgra834', '%^$asgsg', '23hgfh*', 'sfg343^%adf']})
col
0 sgra834
1 %^$asgsg
2 23hgfh*
3 sfg343^%adf
You can use replace,
df.replace('[^a-zA-Z]', '', regex = True)
You get
col
0 sgra
1 asgsg
2 hgfh
3 sfgadf
Upvotes: 3