Reputation: 1617
I'm new to python, so bear with me on this one. I want to make a script that searches within all excel files in a specific folder, to see if they contain an '@' sign. If it finds a match it should write to a log.txt files with the name of the excel file. I don't have the name of the excel files in advance.
So far I'm stuck at just read in one file, and return true if it finds the string. What I got is
import pandas as pd
df = pd.read_excel(open('test.xlsx','rb'), sheetname=0)
print(df)
if '@' in df:
print("true")
else:
print("false")
This returns the content of sheet1 in the excel file correctly, but the search to find a match, doesn't seem to work. Any ideas? Maybe Im doing it all wrong.
Upvotes: 0
Views: 3789
Reputation: 173
This an answer without using pandas
import logging
LOG_FILENAME = r'file_output_location_path\log.txt'
logging.basicConfig(filename=LOG_FILENAME,
level=logging.WARNING,
format='%(asctime)s %(message)s'
)
source_folder = r'excel_file_folder_path'
def containsString():
for somefile in os.listdir(source_folder):
if'@' in somefile:
logging.warning(somefile)
Upvotes: 1
Reputation: 16079
You should read through Pandas - Working with Text Data specifically the section on strings that match or contain a pattern.
Unfortunately you cannot search a DataFrame with the string methods common to core Python. Instead you would use a StringMethod, like so:
out = False
for col in df.columns:
if df[col].str.contains('@').any():
out = 'True'
break
print(out)
Also, I don't believe it is necessary to use open
when using pd.read_excel
. E.g
df = pd.read_excel(open('test.xlsx','rb'), sheetname=0)
Can be rewritten:
df = pd.read_excel('test.xlsx', sheetname=0)
As far as iterating over files in a folder take a look at the builtin module glob.
Upvotes: 3