programming freak
programming freak

Reputation: 899

how to check for specific characters inside csv file using pandas

I'm have read a CSV file into pandas dataframe and trying to find all the sentences that contains the words I'm looking for and when ever finding any of them print it with its original index from the main CSV not a new index. this is the code I'm trying but it gives me an error for some reason

lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'


tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)

newdata=tdata[tdata['sentences'].str.isin(lookfor)]

print (newdata)


#a sample set
-----------------------------

#hi, how are; you 
#im good thanks
#How ? Is live.
#good, what about ) you/
#my name is alex
#hello, alex how are you !
#im good!
#great news
#thanks!
-----------------------------

it returns this error


newdata=tdata[tdata['sentences'].str.isin(pat)]
AttributeError: 'StringMethods' object has no attribute 'isin'

input data looks like

enter image description here

output I'm expecting is

enter image description here

Upvotes: 0

Views: 708

Answers (1)

morganics
morganics

Reputation: 1249

You probably want the 'contains' method, something like

df = tdata[tdata.sentences.str.contains(pat, regex=True, na=False)]

Full code should look something like;

lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'

tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)

tdata['row_index'] = 1
tdata['row_index'] = tdata['row_index'].cumsum()

filtered = tdata[tdata.sentences.str.contains(lookfor, regex=True, na=False)]
filtered.to_csv('./my_path.csv', index=False)

Upvotes: 1

Related Questions