joshi123
joshi123

Reputation: 865

error iterating text rows from pandas dataframe

I'm encountering an error when trying to iterate over a series in a pandas dataframe which contains freetext. The text is contained in df[1].

import pandas as pd
corpus = []
for i in range(0, 1000):
    review = df[1][i]

The error raised is on the last row of code.

except KeyError as e1: if len(self) > 0 and self.inferred_type in ['integer', 'boolean']: ... KeyError: 100

Despite searching I can't work out what the error message means.

Edit I realised that the error was not being caused by the regex, so have taken all reference to regex out of the question. The error remains the same with the code as shown above.

Upvotes: 1

Views: 68

Answers (1)

Vaishali
Vaishali

Reputation: 38415

Using loop is considered the least optimum option in Pandas. Please look into df.replace().

Consider this dataframe,

df = pd.DataFrame({'col': ['sgra834', '%^$asgsg', '23hgfh*', 'sfg343^%adf']})

    col
0   sgra834
1   %^$asgsg
2   23hgfh*
3   sfg343^%adf

You can use replace,

df.replace('[^a-zA-Z]', '', regex = True)

You get

    col
0   sgra
1   asgsg
2   hgfh
3   sfgadf

Upvotes: 3

Related Questions