Reputation: 1
My file path is
C:/Users/Ratul/Downloads/Machine_Learning_Data/reddit_data/reddit_data/
There are many folders in that directory. I need to look through those directories and open files that starts with 'RC_'
Here's my code:
import sqlite3
import json
import os
from datetime import datetime
timeframe = '2015-05'
sql_transaction = []
connection = sqlite3.connect('{}.db'.format(timeframe))
c = connection.cursor()
def create_table():
c.execute("CREATE TABLE IF NOT EXISTS parent_reply(parent_id TEXT PRIMARY KEY, comment_id TEXT UNIQUE, parent TEXT, comment TEXT, subreddit TEXT, unix INT, score INT)")
def format_data(data):
data = data.replace('\n',' newlinechar ').replace('\r',' newlinechar ').replace('"',"'")
return data
def find_parent(pid):
try:
sql = "SELECT comment FROM parent_reply WHERE comment_id = '{}' LIMIT 1".format(pid)
c.execute(sql)
result = c.fetchone()
if result != None:
return result[0]
else: return False
except Exception as e:
#print(str(e))
return False
if __name__ == '__main__':
create_table()
row_counter = 0
paired_rows = 0
with open('C:/Users/Ratul/Downloads/Machine_Learning_Data/reddit_data/reddit_data/{}/RC_{}'.format(timeframe.split('-')[0],timeframe), buffering=1000) as f:
for row in f:
row_counter += 1
row = json.loads(row)
parent_id = row['parent_id']
body = format_data(row['body'])
created_utc = row['created_utc']
score = row['score']
comment_id = row['name']
subreddit = row['subreddit']
parent_data = find_parent(parent_id)
# maybe check for a child, if child, is our new score superior? If so, replace. If not...
if score >= 2:
existing_comment_score = find_existing_score(parent_id)
But it seems there is some mistake in the path. I get an error
Traceback (most recent call last): File "C:/Users/Ratul/AppData/Local/Programs/Python/Python37/test02.py", line 36, in with open('C:/Users/Ratul/Downloads/Machine_Learning_Data/reddit_data/reddit_data/{}/RC_{}'.format(timeframe.split('-')[0],timeframe), buffering=1000) as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Ratul/Downloads/Machine_Learning_Data/reddit_data/reddit_data/2015/RC_2015-05'
I'm not sure what wrong I did there. Please help.
Upvotes: 0
Views: 2260
Reputation: 51643
Use How to debug small programs (#1) and
print('C:/Users/Ratul/Downloads/Machine_Learning_Data/reddit_data/reddit_data/{}/RC_{}'.format(
timeframe.split('-')[0],timeframe))
instead of open
. Check if all exists - because for some of your values it does not exist. Hence the error.
If most of your files exist, it is far easier to handle the error itself:
myname = 'C:/Users/Ratul/Downloads/Machine_Learning_Data/reddit_data/reddit_data/{}/RC_{}'.format(timeframe.split('-')[0],timeframe)
try:
with open(myname, buffering=1000) as f:
for row in f:
row_counter += 1
row = json.loads(row)
parent_id = row['parent_id']
body = format_data(row['body'])
created_utc = row['created_utc']
score = row['score']
comment_id = row['name']
subreddit = row['subreddit']
parent_data = find_parent(parent_id)
# maybe check for a child, if child, is our new score superior? If so, replace. If not...
if score >= 2:
existing_comment_score = find_existing_score(parent_id)
except FileNotFoundError as fnfError:
print(myname)
print(fnfError)
The open()
command does not care about you using \
or /
- if using \
you should escape it or use raw strings (aka: r'C:\some\dir\file.txt'
) - your syntax is ok as is - open() will use the appropriate directory delimiters under windows even if you give it 'c:/somedir/file.txt'
Readup: About error handling
Upvotes: 2