Reputation: 1307
This post might be related to this one. I would like to encrypt a .csv file with a password or token. I would then like to write a script that decrypts the file using the password, reads in the .csv file as a data frame, and continues doing data analysis with the content. How would one achieve this?
Example:
import pandas as pd
import csv
# 1.) Create the .csv file
super_secret_dict = {'super_secret_information':'foobar'}
with open('super_secret_csv.csv','w') as f:
w = csv.DictWriter(f,super_secret_dict.keys())
w.writeheader()
w.writerow(super_secret_dict)
# 2.) Now encrypt the .csv file with a very safe encryption method and generate
# a password/token that can be shared with people that should have access to the
# encrypted .csv file
# ...
# ...
# 3.) Everytime a user wants to read in the .csv file (e.g. using pd.read_csv())
# the script should ask the user to type in the password, then read in
# the .csv file and then continue running the rest of the script
super_secret_df = pd.read_csv('./super_secret_csv.csv')
Upvotes: 0
Views: 4975
Reputation: 946
I am not fully sure about Python as I don't work with that, but I am certain the process would be pretty similar to R. There is sodium library in R and in python that does both symmetric and asymmetric cryptography, so one would start with setting up key(s) then applying encryption function to the dataframe column(s), then saving it as csv. On the other end the user would have to read in the dataframe, then apply decryption function (column by column) using the key that you have set up.
The encryption thus would apply to values only, therefore your ability to selectively encrypt/decrypt the values depends on you programmatically addressing cells, columns, rows. I am saying this because functions in R are vectorised - i.e. it is easy to apply for the whole column like mydecryptfunc(mydataframe$columnIwant)
. The nice thing about this is - you can apply granularity, as opposed to encrypting the whole file. If you wanted to you could use different keys to encrypt different columns (or values) and share those decryption keys with different people how you want. A bit of an overkill vast majority of the time, of course.
The above, naturally, means that you have to take care of a secure channel between you and the colleague so that the decryption key could be transmitted (it should not travel simply attached to the csv file 8-)). An easy way would be to send an e-mail encrypted with your colleague's public key.
Alternatively you could call your colleague and read the key out to them.
Sorry for stating the rather obvious things, but maybe they might help someone. An easy read on symmetric and asymmetric crypto here.
Upvotes: 1
Reputation: 11
You can use the cryptography library to create an encryption scheme.
Create a Key:
from cryptography.fernet import Fernet
key = Fernet.generate_key()
f = Fernet(key)
Save that key somewhere!
Load your key when you want to encrypt!
def load_key():
return open(PATH TO SECRET KEY,"rb").read()
Encrypt your file
def encrypt_it(path_csv):
"""Takes a message an encrypts it
"""
key = load_key()
encrypted = ''
# create Fernet using secret
f = Fernet(key)
with open(path_csv, 'rb') as unencrypted:
_file = unencrypted.read()
encrypted = f.encrypt(_file)
with open('encrypted_file.csv', 'wb') as encrypted_file:
encrypted_file.write(encrypted)
Read it back later:
def decrypt_it(path_encrypted):
key = load_key()
f = Fernet(key)
decrypted = ''
with open(path_encrypted, 'rb') as encrypted_file:
decrypted = f.decrypt(encrypted_file.read())
return decrypted
Upvotes: 1