komodovaran_
komodovaran_

Reputation: 2012

pandas reading excel tables from pandas-exported json

So I have a small table from excel, which I'd like to read in Pandas. Actually, I have several of the likes, and I'd like to just embed them directly in my script rather than keeping track of separate files.

My file could be a table like this enter image description here

And now I want to make it embeddable:

import pandas as pd
pd.set_option("display.width", 1000)

df = pd.read_excel("/your/excel/here/TEST.xlsx")
my_json = df.to_json()

# print of the above json that I want to keep in the script
read_this = {"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}}

new_df = pd.DataFrame.from_dict(read_this)

print("original\n", df, "\n")

print("from json\n", new_df)

And I get the following

original
             970     1250       2000         3000         4000         5000       5500          6000
0.000000  0.0000  0.7200   0.360000     2.160000     8.640000     7.200000    285.480    563.760000
0.975302  0.1530  0.6608   1.183680     5.037120    10.954286   134.188981    548.688    963.187019
1.950604  0.7110  0.5616   3.504960     9.858240    16.268571   492.067019   1290.456   2088.844981
2.925905  1.2690  0.4624   5.383636    13.331520    24.386667   849.945056   2032.224   3214.502943
3.901207  1.7775  0.3632   6.398182    15.831360    33.480000  1207.823094   2773.992   4340.160906
4.876509  1.3125  0.3600   9.031304    18.573750    36.506667  1632.171428   3515.760   5399.348572
5.851811  0.8475  0.3600  12.913043    21.503250    34.853333  2814.281143   5088.960   7363.638857
6.827113  0.3825  0.3600  14.779200    24.432750   387.681231  3996.390856   6662.160   9327.929144
7.802414  0.0000  0.3600  15.820800   818.440258  1301.771077  5178.500572   8235.360  11292.219430
8.777716  0.0000  0.3600  16.560000  1625.416258  2215.860923  6360.610284   9808.560  13256.509720
9.753018  0.0000  0.3600  16.560000  2041.920000  2908.800000  7542.720000  11381.760  15220.800000 

from json
                1250       2000         3000         4000         5000       5500          6000     970
0.0          0.7200   0.360000     2.160000     8.640000     7.200000    285.480    563.760000  0.0000
0.975301809  0.6608   1.183680     5.037120    10.954286   134.188981    548.688    963.187019  0.1530
1.950603618  0.5616   3.504960     9.858240    16.268571   492.067019   1290.456   2088.844981  0.7110
2.925905427  0.4624   5.383636    13.331520    24.386667   849.945056   2032.224   3214.502943  1.2690
3.901207236  0.3632   6.398182    15.831360    33.480000  1207.823094   2773.992   4340.160906  1.7775
4.876509045  0.3600   9.031304    18.573750    36.506667  1632.171428   3515.760   5399.348572  1.3125
5.851810854  0.3600  12.913043    21.503250    34.853333  2814.281143   5088.960   7363.638857  0.8475
6.827112663  0.3600  14.779200    24.432750   387.681231  3996.390856   6662.160   9327.929144  0.3825
7.802414472  0.3600  15.820800   818.440258  1301.771077  5178.500572   8235.360  11292.219430  0.0000
8.777716281  0.3600  16.560000  1625.416258  2215.860923  6360.610284   9808.560  13256.509720  0.0000
9.75301809   0.3600  16.560000  2041.920000  2908.800000  7542.720000  11381.760  15220.800000  0.0000

So close, but not really the same. How can I preserve the original structure as an embeddable line of text?

Pastebin of excel file available here

Upvotes: 0

Views: 264

Answers (3)

sciroccorics
sciroccorics

Reputation: 2427

OK, finally I understand what you want: include the content of your Excel file (i.e. a 2D matrix) directly as a variable in the source code of your script, so that you don't have to read the file anymore. Am I right ?

The native data structure able to store 2D matrices is a list of lists. This can be obtained from your Excel file by the following code:

import pandas as pd
df = pd.read_excel("/your/excel/here/TEST.xlsx")
print("mat =", df.values.tolist())

which should print something like:

mat = [['', 970, 1250....], [0.00, 0, 0.72...], ...]

Then you simply copy the printed lines with your mouse and paste them at the beginning of your code, to create a matrix mat that stores your data.

If you need panda DataFrame, simply change the print line to:

print("df = pd.DataFrame(%s)" % df.values.tolist())

and apply the same copy/paste process

Upvotes: 1

komodovaran_
komodovaran_

Reputation: 2012

Okay, so I played around a bit, and I found the exact way to be like this:

import pandas as pd
from collections import OrderedDict

pd.set_option("display.width", 1000)

# Load in the original file (this needs to happen just once)
df = pd.read_excel("/your/excel/here/TEST.xlsx")

# Write this whole table to a single line of dict-formatted JSON for portability
embeddable_json = df.to_json()

# Print it to see the actual output
print(embeddable_json)

# Paste it in here, and wrap it in an OrderedDict to maintain original order
read_this = OrderedDict({"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}})

# Read in the above to see if it matches the original file
new_df = pd.DataFrame.from_dict(read_this)

# It does!
print("original\n", df, "\n")
print("from json\n", new_df)

So now I can keep small tables around in python script itself, without any external files, applications, pastes, etc.

So next time I can just run this, and the table is already there. No need to keep track of anything else:

import pandas as pd
from collections import OrderedDict

read_this = OrderedDict({"970":{"0.0":0.0,"0.975301809":0.153,"1.950603618":0.711,"2.925905427":1.269,"3.901207236":1.7775,"4.876509045":1.3125,"5.851810854":0.8475,"6.827112663":0.3825,"7.802414472":0.0,"8.777716281":0.0,"9.75301809":0.0},"1250":{"0.0":0.72,"0.975301809":0.6608,"1.950603618":0.5616,"2.925905427":0.4624,"3.901207236":0.3632,"4.876509045":0.36,"5.851810854":0.36,"6.827112663":0.36,"7.802414472":0.36,"8.777716281":0.36,"9.75301809":0.36},"2000":{"0.0":0.36,"0.975301809":1.18368,"1.950603618":3.50496,"2.925905427":5.383636362,"3.901207236":6.398181817,"4.876509045":9.031304347,"5.851810854":12.91304348,"6.827112663":14.7792,"7.802414472":15.8208,"8.777716281":16.56,"9.75301809":16.56},"3000":{"0.0":2.16,"0.975301809":5.03712,"1.950603618":9.85824,"2.925905427":13.33152,"3.901207236":15.83136,"4.876509045":18.57375,"5.851810854":21.50325,"6.827112663":24.43275,"7.802414472":818.440258,"8.777716281":1625.416258,"9.75301809":2041.92},"4000":{"0.0":8.64,"0.975301809":10.95428571,"1.950603618":16.26857143,"2.925905427":24.38666667,"3.901207236":33.48,"4.876509045":36.50666666,"5.851810854":34.85333333,"6.827112663":387.6812305,"7.802414472":1301.771077,"8.777716281":2215.860923,"9.75301809":2908.8},"5000":{"0.0":7.2,"0.975301809":134.1889811,"1.950603618":492.0670188,"2.925905427":849.9450564,"3.901207236":1207.823094,"4.876509045":1632.171428,"5.851810854":2814.281143,"6.827112663":3996.390856,"7.802414472":5178.500572,"8.777716281":6360.610284,"9.75301809":7542.72},"5500":{"0.0":285.48,"0.975301809":548.6879999,"1.950603618":1290.456,"2.925905427":2032.224,"3.901207236":2773.992,"4.876509045":3515.76,"5.851810854":5088.96,"6.827112663":6662.16,"7.802414472":8235.36,"8.777716281":9808.56,"9.75301809":11381.76},"6000":{"0.0":563.76,"0.975301809":963.1870186,"1.950603618":2088.844981,"2.925905427":3214.502943,"3.901207236":4340.160906,"4.876509045":5399.348572,"5.851810854":7363.638857,"6.827112663":9327.929144,"7.802414472":11292.21943,"8.777716281":13256.50972,"9.75301809":15220.8}})
df = pd.DataFrame.from_dict(read_this)

I've only tested for this specific case, so I don't know if there are badly-behaving dataframes.

Hopefully I didn't waste too much of everybody's time.

Upvotes: 0

jpp
jpp

Reputation: 164843

In my opinion, HDF5 is a better option for numeric data. It has type sensitivity, native compression and out-of-memory functionality.

In addition, this format is accessible across languages.

Some example code below, but more information available in h5py documentation.

import pandas as pd
import h5py

# read data
df = pd.read_excel('file.xlsx')

# convert to numpy array
arr = df.values

# extract index and column names
idx = df.index
cols = df.columns

# write to HDF5 file
with h5py.File('out.h5', 'w') as hf:

    hf['/'].attrs['index'] = idx
    hf['/'].attrs['columns'] = cols

    hf.create_dataset('arr', shape=arr.shape, data=arr,
                      compression='gzip', compression_opts=9)

Upvotes: 0

Related Questions