rprasad
rprasad

Reputation: 376

csvkit: for csv to Table, how do you preserve quoted strings when

When using csvkit, I'm having trouble keeping character data from getting transformed to numeric data. For the example below, my first column gets transformed into an 'int'

Data: (test.csv)

"BG_ID_10","DisSens_2010","PrivateNeglect_2010"
"250250001001",0.506632168908,0.363523524561
"250250001004",0.346632168908,0.352456136352

Code snippet:

from csvkit import sql as csvkit_sql
from csvkit import table
from csv import QUOTE_NONNUMERIC

fh = open('test.csv', 'rb')

csv_table = table.Table.from_csv(f=fh,\
                        name='tname',\
                        delimiter=',',\
                        quotechar='"',\
                        snifflimit=0,\
                        )

for col in csv_table:
    print col.name, col.type

Output:

BG_ID_10 <type 'int'>
DisSens_2010 <type 'float'>
PrivateNeglect_2010 <type 'float'>

I have a working hack but would appreciate any help better parameters for the "from_csv" or alternative suggestions. (Note, after this step, the csvkit commands are used for generating Postgres create table statements.)

Working Hack:

char_col = csv_table[0] # get first column
char_col.type = unicode # change type
for idx, val in enumerate(char_col):  # force to unicode
    char_col[idx] = u'%s' % val

Upvotes: 2

Views: 475

Answers (1)

Quentin Pradet
Quentin Pradet

Reputation: 4771

You can add infer_types=False to your from_csv call. All types will become unicode:

BG_ID_10 <type 'unicode'>
DisSens_2010 <type 'unicode'>
PrivateNeglect_2010 <type 'unicode'>

But there's currently no way to specify the type without building Columns yourself.

Upvotes: 1

Related Questions