user7188934
user7188934

Reputation: 1091

How to convert bytes data into a python pandas dataframe?

I would like to convert 'bytes' data into a Pandas dataframe.

The data looks like this (few first lines):

    (b'#Settlement Date,Settlement Period,CCGT,OIL,COAL,NUCLEAR,WIND,PS,NPSHYD,OCGT'
 b',OTHER,INTFR,INTIRL,INTNED,INTEW,BIOMASS\n2017-01-01,1,7727,0,3815,7404,3'
 b'923,0,944,0,2123,948,296,856,238,\n2017-01-01,2,8338,0,3815,7403,3658,16,'
 b'909,0,2124,998,298,874,288,\n2017-01-01,3,7927,0,3801,7408,3925,0,864,0,2'
 b'122,998,298,816,286,\n2017-01-01,4,6996,0,3803,7407,4393,0,863,0,2122,998'

The columns headers appear at the top. each subsequent line is a timestamp and numbers.

Is there a straightforward way to do this?

Thank you very much

@Paula Livingstone:

This seems to work:

s=str(bytes_data,'utf-8')

file = open("data.txt","w") 

file.write(s)
df=pd.read_csv('data.txt')

maybe this can be done without using a file in between.

Upvotes: 54

Views: 94488

Answers (3)

KenHBS
KenHBS

Reputation: 7164

You can also use BytesIO directly:

from io import BytesIO

df = pd.read_csv(BytesIO(bytes_data))

This will save you the step of transforming bytes_data to a string

Upvotes: 62

Tim
Tim

Reputation: 736

I had the same issue and found this library https://docs.python.org/2/library/stringio.html from the answer here: How to create a Pandas DataFrame from a string

Try something like:

from io import StringIO

s=str(bytes_data,'utf-8')

data = StringIO(s) 

df=pd.read_csv(data)

Upvotes: 62

Paula Livingstone
Paula Livingstone

Reputation: 1215

Ok cool, your input formatting is quite awkward but the following works:

with open('file.txt', 'r') as myfile:
    data=myfile.read().replace('\n', '') #read in file as a string

df = pd.Series(" ".join(data.strip(' b\'').strip('\'').split('\' b\'')).split('\\n')).str.split(',', expand=True)

print(df)

this produces the following:

                 0                  1     2    3     4        5      6   7   \
0  #Settlement Date  Settlement Period  CCGT  OIL  COAL  NUCLEAR   WIND  PS   
1        2017-01-01                  1  7727    0  3815     7404   3923   0   
2        2017-01-01                  2  8338    0  3815     7403   3658  16   
3        2017-01-01                  3  7927    0  3801     7408   3925   0   

       8      9      10     11      12      13     14       15  
0  NPSHYD  OCGT   OTHER  INTFR  INTIRL  INTNED  INTEW  BIOMASS  
1     944      0   2123    948     296     856    238           
2     909      0   2124    998     298     874    288           
3     864      0   2122    998     298     816    286     None 

In order for this to work you will need to ensure that your input file contains only a collection of complete rows. For this reason I removed the partial row for the purposes of the test.

As you have said that the data source is an http GET request then the initial read would take place using pandas.read_html.

More detail on this can be found here. Note specifically the section on io (io : str or file-like).

Upvotes: 1

Related Questions