Reputation: 899
I'm have read a CSV file into pandas dataframe and trying to find all the sentences that contains the words I'm looking for and when ever finding any of them print it with its original index from the main CSV not a new index. this is the code I'm trying but it gives me an error for some reason
lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'
tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)
newdata=tdata[tdata['sentences'].str.isin(lookfor)]
print (newdata)
#a sample set
-----------------------------
#hi, how are; you
#im good thanks
#How ? Is live.
#good, what about ) you/
#my name is alex
#hello, alex how are you !
#im good!
#great news
#thanks!
-----------------------------
it returns this error
newdata=tdata[tdata['sentences'].str.isin(pat)]
AttributeError: 'StringMethods' object has no attribute 'isin'
input data looks like
output I'm expecting is
Upvotes: 0
Views: 708
Reputation: 1249
You probably want the 'contains' method, something like
df = tdata[tdata.sentences.str.contains(pat, regex=True, na=False)]
Full code should look something like;
lookfor = '[' + re.escape(",?!.:;'؛؛؟'-)(؛،؛«/") + ']'
tdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(tdata.count(' ') == 0)
tdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)
tdata['row_index'] = 1
tdata['row_index'] = tdata['row_index'].cumsum()
filtered = tdata[tdata.sentences.str.contains(lookfor, regex=True, na=False)]
filtered.to_csv('./my_path.csv', index=False)
Upvotes: 1