Reputation: 2012
So I have a small table from excel, which I'd like to read in Pandas. Actually, I have several of the likes, and I'd like to just embed them directly in my script rather than keeping track of separate files.
My file could be a table like this
And now I want to make it embeddable:
import pandas as pd
pd.set_option("display.width", 1000)
df = pd.read_excel("/your/excel/here/TEST.xlsx")
my_json = df.to_json()
# print of the above json that I want to keep in the script
read_this = {"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}}
new_df = pd.DataFrame.from_dict(read_this)
print("original\n", df, "\n")
print("from json\n", new_df)
And I get the following
original
970 1250 2000 3000 4000 5000 5500 6000
0.000000 0.0000 0.7200 0.360000 2.160000 8.640000 7.200000 285.480 563.760000
0.975302 0.1530 0.6608 1.183680 5.037120 10.954286 134.188981 548.688 963.187019
1.950604 0.7110 0.5616 3.504960 9.858240 16.268571 492.067019 1290.456 2088.844981
2.925905 1.2690 0.4624 5.383636 13.331520 24.386667 849.945056 2032.224 3214.502943
3.901207 1.7775 0.3632 6.398182 15.831360 33.480000 1207.823094 2773.992 4340.160906
4.876509 1.3125 0.3600 9.031304 18.573750 36.506667 1632.171428 3515.760 5399.348572
5.851811 0.8475 0.3600 12.913043 21.503250 34.853333 2814.281143 5088.960 7363.638857
6.827113 0.3825 0.3600 14.779200 24.432750 387.681231 3996.390856 6662.160 9327.929144
7.802414 0.0000 0.3600 15.820800 818.440258 1301.771077 5178.500572 8235.360 11292.219430
8.777716 0.0000 0.3600 16.560000 1625.416258 2215.860923 6360.610284 9808.560 13256.509720
9.753018 0.0000 0.3600 16.560000 2041.920000 2908.800000 7542.720000 11381.760 15220.800000
from json
1250 2000 3000 4000 5000 5500 6000 970
0.0 0.7200 0.360000 2.160000 8.640000 7.200000 285.480 563.760000 0.0000
0.975301809 0.6608 1.183680 5.037120 10.954286 134.188981 548.688 963.187019 0.1530
1.950603618 0.5616 3.504960 9.858240 16.268571 492.067019 1290.456 2088.844981 0.7110
2.925905427 0.4624 5.383636 13.331520 24.386667 849.945056 2032.224 3214.502943 1.2690
3.901207236 0.3632 6.398182 15.831360 33.480000 1207.823094 2773.992 4340.160906 1.7775
4.876509045 0.3600 9.031304 18.573750 36.506667 1632.171428 3515.760 5399.348572 1.3125
5.851810854 0.3600 12.913043 21.503250 34.853333 2814.281143 5088.960 7363.638857 0.8475
6.827112663 0.3600 14.779200 24.432750 387.681231 3996.390856 6662.160 9327.929144 0.3825
7.802414472 0.3600 15.820800 818.440258 1301.771077 5178.500572 8235.360 11292.219430 0.0000
8.777716281 0.3600 16.560000 1625.416258 2215.860923 6360.610284 9808.560 13256.509720 0.0000
9.75301809 0.3600 16.560000 2041.920000 2908.800000 7542.720000 11381.760 15220.800000 0.0000
So close, but not really the same. How can I preserve the original structure as an embeddable line of text?
Pastebin of excel file available here
Upvotes: 0
Views: 264
Reputation: 2427
OK, finally I understand what you want: include the content of your Excel file (i.e. a 2D matrix) directly as a variable in the source code of your script, so that you don't have to read the file anymore. Am I right ?
The native data structure able to store 2D matrices is a list of lists. This can be obtained from your Excel file by the following code:
import pandas as pd
df = pd.read_excel("/your/excel/here/TEST.xlsx")
print("mat =", df.values.tolist())
which should print something like:
mat = [['', 970, 1250....], [0.00, 0, 0.72...], ...]
Then you simply copy the printed lines with your mouse and paste them at the beginning of your code, to create a matrix mat
that stores your data.
If you need panda DataFrame, simply change the print
line to:
print("df = pd.DataFrame(%s)" % df.values.tolist())
and apply the same copy/paste process
Upvotes: 1
Reputation: 2012
Okay, so I played around a bit, and I found the exact way to be like this:
import pandas as pd
from collections import OrderedDict
pd.set_option("display.width", 1000)
# Load in the original file (this needs to happen just once)
df = pd.read_excel("/your/excel/here/TEST.xlsx")
# Write this whole table to a single line of dict-formatted JSON for portability
embeddable_json = df.to_json()
# Print it to see the actual output
print(embeddable_json)
# Paste it in here, and wrap it in an OrderedDict to maintain original order
read_this = OrderedDict({"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}})
# Read in the above to see if it matches the original file
new_df = pd.DataFrame.from_dict(read_this)
# It does!
print("original\n", df, "\n")
print("from json\n", new_df)
So now I can keep small tables around in python script itself, without any external files, applications, pastes, etc.
So next time I can just run this, and the table is already there. No need to keep track of anything else:
import pandas as pd
from collections import OrderedDict
read_this = OrderedDict({"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}})
df = pd.DataFrame.from_dict(read_this)
I've only tested for this specific case, so I don't know if there are badly-behaving dataframes.
Hopefully I didn't waste too much of everybody's time.
Upvotes: 0
Reputation: 164843
In my opinion, HDF5 is a better option for numeric data. It has type sensitivity, native compression and out-of-memory functionality.
In addition, this format is accessible across languages.
Some example code below, but more information available in h5py documentation.
import pandas as pd
import h5py
# read data
df = pd.read_excel('file.xlsx')
# convert to numpy array
arr = df.values
# extract index and column names
idx = df.index
cols = df.columns
# write to HDF5 file
with h5py.File('out.h5', 'w') as hf:
hf['/'].attrs['index'] = idx
hf['/'].attrs['columns'] = cols
hf.create_dataset('arr', shape=arr.shape, data=arr,
compression='gzip', compression_opts=9)
Upvotes: 0