laneherby
laneherby

Reputation: 485

Splitting CSV data from URL not working with Python csv.reader

When I print a row from my data I get this output:

"SL","2018-09-30","89.7","-2.1644","5.9884","Max Kepler",596146,518858,"field_out","hit_into_play",,,,,"9","Max Kepler grounds out, shortstop Leury Garcia to first baseman Matt Davidson.  ","R","L","R","MIN","CWS","X",6,"ground_ball",1,2,2018,"0.0456","0.4072","0.7193","1.8204","null","null","null",2,8,"Bot","118.48","141.29",,,456078,,"180930_215102","6.8928","-130.3241","-5.0488","-0.9006","26.5098","-26.9291","3.5258","1.6280","60","63.4","6.3970","88.0450","2186","5.2660",531825,518858,456078,571602,660162,570560,544725,547170,641477,594953,"55.2332","0.176","0.145","0.00",1,0,0,2,67,"5","Slider",5,4,5,4,4,5,5,4,"Strategic","Standard"

Then when I use this line of code to split it with csv reader:

csvRow = list(csv.reader(data.splitlines()[1]))
print(csvRow)

These are my results:

[['SL'], ['', ''], ['2018-09-30'], ['', ''], ['89.7'], ['', ''], ['-2.1644'], ['', ''], ['5.9884'], ['', ''], ['Max Kepler'], ['', ''], ['5'], ['9'], ['6'], ['1'], ['4'], ['6'], ['', ''], ['5'], ['1'], ['8'], ['8'], ['5'], ['8'], ['', ''], ['field_out'], ['', ''], ['hit_into_play'], ['', ''], ['', ''], ['', ''], ['', ''], ['', ''], ['9'], ['', ''], ['Max Kepler grounds out, shortstop Leury Garcia to first baseman Matt Davidson.  '], ['', ''], ['R'], ['', ''], ['L'], ['', ''], ['R'], ['', ''], ['MIN'], ['', ''], ['CWS'], ['', ''], ['X'], ['', ''], ['6'], ['', ''], ['ground_ball'], ['', ''], ['1'], ['', ''], ['2'], ['', ''], ['2'], ['0'], ['1'], ['8'], ['', ''], ['0.0456'], ['', ''], ['0.4072'], ['', ''], ['0.7193'], ['', ''], ['1.8204'], ['', ''], ['null'], ['', ''], ['null'], ['', ''], ['null'], ['', ''], ['2'], ['', ''], ['8'], ['', ''], ['Bot'], ['', ''], ['118.48'], ['', ''], ['141.29'], ['', ''], ['', ''], ['', ''], ['4'], ['5'], ['6'], ['0'], ['7'], ['8'], ['', ''], ['', ''], ['180930_215102'], ['', ''], ['6.8928'], ['', ''], ['-130.3241'], ['', ''], ['-5.0488'], ['', ''], ['-0.9006'], ['', ''], ['26.5098'], ['', ''], ['-26.9291'], ['', ''], ['3.5258'], ['', ''], ['1.6280'], ['', ''], ['60'], ['', ''], ['63.4'], ['', ''], ['6.3970'], ['', ''], ['88.0450'], ['', ''], ['2186'], ['', ''], ['5.2660'], ['', ''], ['5'], ['3'], ['1'], ['8'], ['2'], ['5'], ['', ''], ['5'], ['1'], ['8'], ['8'], ['5'], ['8'], ['', ''], ['4'], ['5'], ['6'], ['0'], ['7'], ['8'], ['', ''], ['5'], ['7'], ['1'], ['6'], ['0'], ['2'], ['', ''], ['6'], ['6'], ['0'], ['1'], ['6'], ['2'], ['', ''], ['5'], ['7'], ['0'], ['5'], ['6'], ['0'], ['', ''], ['5'], ['4'], ['4'], ['7'], ['2'], ['5'], ['', ''], ['5'], ['4'], ['7'], ['1'], ['7'], ['0'], ['', ''], ['6'], ['4'], ['1'], ['4'], ['7'], ['7'], ['', ''], ['5'], ['9'], ['4'], ['9'], ['5'], ['3'], ['', ''], ['55.2332'], ['', ''], ['0.176'], ['', ''], ['0.145'], ['', ''], ['0.00'], ['', ''], ['1'], ['', ''], ['0'], ['', ''], ['0'], ['', ''], ['2'], ['', ''], ['6'], ['7'], ['', ''], ['5'], ['', ''], ['Slider'], ['', ''], ['5'], ['', ''], ['4'], ['', ''], ['5'], ['', ''], ['4'], ['', ''], ['4'], ['', ''], ['5'], ['', ''], ['5'], ['', ''], ['4'], ['', ''], ['Strategic'], ['', ''], ['Standard']]

As you can see the commas are being put into their own indexes in the list. Obviously this isn't what I want. I can't use just a split function because some of the columns have commas within their actaul data.

Upvotes: 0

Views: 146

Answers (3)

Mark Tolonen
Mark Tolonen

Reputation: 177971

csv.reader takes an interable that returns lines. Passing a string (an iterable that returns a single character), makes it behave as if each character is on a line by itself, except when grouped by the default quote character. When it gets to a "line" that is a single quote character, it looks like two blank columns: ['', ''].

You haven't shown all your code, but just do something like:

with open('data.csv',newline='') as f: # newline per csv documentation.
    for line in csv.reader(f):
        print(line)

Or if you have your data in one continuous string, use io.StringIO to return a line-at-a-time:

import csv
import io

data = '''\
"SL","2018-09-30","89.7","-2.1644","5.9884","Max Kepler",596146,518858,"field_out","hit_into_play",,,,,"9","Max Kepler grounds out, shortstop Leury Garcia to first baseman Matt Davidson.  ","R","L","R","MIN","CWS","X",6,"ground_ball",1,2,2018,"0.0456","0.4072","0.7193","1.8204","null","null","null",2,8,"Bot","118.48","141.29",,,456078,,"180930_215102","6.8928","-130.3241","-5.0488","-0.9006","26.5098","-26.9291","3.5258","1.6280","60","63.4","6.3970","88.0450","2186","5.2660",531825,518858,456078,571602,660162,570560,544725,547170,641477,594953,"55.2332","0.176","0.145","0.00",1,0,0,2,67,"5","Slider",5,4,5,4,4,5,5,4,"Strategic","Standard"
'''

for line in csv.reader(io.StringIO(data)):
    print(line)

Output:

['SL', '2018-09-30', '89.7', '-2.1644', '5.9884', 'Max Kepler', '596146', '518858', 'field_out', 'hit_into_play', '', '', '', '', '9', 'Max Kepler grounds out, shortstop Leury Garcia to first baseman Matt Davidson.  ', 'R', 'L', 'R', 'MIN', 'CWS', 'X', '6', 'ground_ball', '1', '2', '2018', '0.0456', '0.4072', '0.7193', '1.8204', 'null', 'null', 'null', '2', '8', 'Bot', '118.48', '141.29', '', '', '456078', '', '180930_215102', '6.8928', '-130.3241', '-5.0488', '-0.9006', '26.5098', '-26.9291', '3.5258', '1.6280', '60', '63.4', '6.3970', '88.0450', '2186', '5.2660', '531825', '518858', '456078', '571602', '660162', '570560', '544725', '547170', '641477', '594953', '55.2332', '0.176', '0.145', '0.00', '1', '0', '0', '2', '67', '5', 'Slider', '5', '4', '5', '4', '4', '5', '5', '4', 'Strategic', 'Standard']

Upvotes: 1

Bill Nace
Bill Nace

Reputation: 111

The reader needs the delimiter changed for CSV files. Specify the delimiter as a string with a single space in your code.

csvRow = list(csv.reader(data.splitlines()[1], delimiter=' '))
print(csvRow)

Upvotes: 0

Wael Almadhoun
Wael Almadhoun

Reputation: 419

Try using data frame:

replace test.csv with your csv source

import pandas as pd
data = pd.read_csv('test.csv', sep=',', delimiter=None, skip_blank_lines=True)
data = data.transpose()

Upvotes: 0

Related Questions