Reputation: 607
In my s3 bucket directory, I have multiple files like .csv, .log, .txt , etc. But I need to read-only .log files from a single directory and append them using boto3. I tried below code but it's reading all files data, not able to restrict using *.log and also the result is coming as a single line separated by '\n' as mentioned below.
How can I read only log files and merge them and the result should come like line by line.
import boto3
import pandas as pd
import csv
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket')
lst = []
for object in my_bucket.objects.filter(Prefix="bulk_data/all_files/"):
print(object.key)
bdy = object.get()['Body'].read().decode('utf-8')
lst.append(bdy)
bdy = ''
print(lst)
lst output coming like this with '\n' as separator. '12345,6006,7290,7200,JKHBJ,S,55\n44345,6996,6290,7288,JKHkk,R,57\n..........'
I should get something like below:
12345,6006,7290,7200,JKHBJ,S,55
44345,6996,6290,7288,JKHkk,R,57
...
Upvotes: 1
Views: 1120
Reputation: 238051
The filter
takes only prefix, not suffix. Thus you have to filter it yourself, for example using:
import boto3
import pandas as pd
import csv
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket')
lst = []
for s3obj in my_bucket.objects.filter(Prefix="attachments/"):
# skip s3 objects not ending with csv
if (not s3obj.key.endswith('csv')): continue
print(s3obj.key)
bdy = s3obj.get()['Body'].read().decode('utf-8')
lst.append(bdy)
bdy = ''
#print(lst)
for file_str in lst:
for line in file_str.split('\n'):
print(line)
Upvotes: 3