How to replace commas with tabs in TSV File

Question

In my dataframe below I am trying to replace the commas in the column curv_typ,maturity,bonds,geo ime with tabs and in the strings below it too, so that I can then create new columns from this.

 curv_typ,maturity,bonds,geo	ime  2015M06D16   2015M06D15   2015M06D11   \
0                 PYC_RT,Y1,GBAAA,EA        -0.24        -0.24        -0.24   
1               PYC_RT,Y1,GBA_AAA,EA        -0.02        -0.03        -0.10   
2                PYC_RT,Y10,GBAAA,EA         0.94         0.92         0.99   
3              PYC_RT,Y10,GBA_AAA,EA         1.67         1.70         1.60   
4                PYC_RT,Y11,GBAAA,EA         1.03         1.01         1.09

The code is as follows, but it is not getting rid of the commas and this is where I am struggling.

import os
import urllib2
import gzip
import StringIO
import pandas as pd

baseURL = "http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file="
filename = "data/irt_euryld_d.tsv.gz"
outFilePath = filename.split('/')[1][:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())

compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') 

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

#Now have to deal with tsv file
import csv

outFilePath = filename.split('/')[1][:-3] #As in the code above, just put here for reference
csvout = 'C:\Users\Sidney\ECB.tsv'
outfile = open(csvout, "w")
with open(outFilePath, "rb") as f:
    for line in f.read():
        line.replace(",", "	")
        outfile.write(line)
outfile.close()

df = pd.DataFrame.from_csv("ECB.tsv", sep="	", index_col=False)

Thank You

EdChum · Accepted Answer

Split the column name to produce your new column names and then call the vectorised str split method with param expand=True:

In [26]:
cols = 'curv_typ,maturity,bonds,geo\time'.split(',')
df[cols] = df['curv_typ,maturity,bonds,geo\time'].str.split(',', expand=True)
df

Out[26]:
  curv_typ,maturity,bonds,geo	ime  2015M06D16  2015M06D15  2015M06D11  \
0               PYC_RT,Y1,GBAAA,EA       -0.24       -0.24       -0.24   
1             PYC_RT,Y1,GBA_AAA,EA       -0.02       -0.03       -0.10   
2              PYC_RT,Y10,GBAAA,EA        0.94        0.92        0.99   
3            PYC_RT,Y10,GBA_AAA,EA        1.67        1.70        1.60   
4              PYC_RT,Y11,GBAAA,EA        1.03        1.01        1.09   

  curv_typ maturity    bonds geo	ime  
0   PYC_RT       Y1    GBAAA       EA  
1   PYC_RT       Y1  GBA_AAA       EA  
2   PYC_RT      Y10    GBAAA       EA  
3   PYC_RT      Y10  GBA_AAA       EA  
4   PYC_RT      Y11    GBAAA       EA

EDIT

For pandas versions 0.16.0 and older then you'll need to use the following line instead:

df[cols] = df['curv_typ,maturity,bonds,geo\time'].str.split(',').apply(pd.Series)

How to replace commas with tabs in TSV File

Answers (2)

Related Questions