Reputation: 21
I am trying to import a CSV file so that I can use it with the k-means clustering algorithm. The file contains 6 columns and over 400 rows. Here is a picture of the excel document I used (before exporting it into a CSV file). In essence, I want to be able to use the column header names in my code so that I can use the column names when plotting the data, as well as clustering it.
I looked into some other documentation and came up with this code but nothing came as an output when I just put it into the command window:
[Player BA OPS RBI OBP] = CSVIMPORT( 'MLBdata.csv', 'columns', {'Player', 'BA', 'OPS', 'RBI', 'OBP'}
The only thing that has worked for me so far is the dlm read function, but it returns 0 when there is a String of words
N = dlmread('MLBdata.csv')
Upvotes: 1
Views: 4554
Reputation: 22225
Given file data.csv
with the following contents:
Player,Year,BA,OPS,RBI,OBP
SandyAlcantara,2019,0.086,0.22,4,0.117
PeteAlonso,2019,0.26,0.941,120,0.358
BrandonLowe,2019,0.27,0.85,51,0.336
MikeSoroka,2019,0.077,0.22,3,0.143
Open an octave terminal and type:
pkg load io
C = csv2cell( 'data.csv' )
resulting in the following cell array:
C =
{
[1,1] = Player
[2,1] = SandyAlcantara
[3,1] = PeteAlonso
[4,1] = BrandonLowe
[5,1] = MikeSoroka
[1,2] = Year
[2,2] = 2019
[3,2] = 2019
[4,2] = 2019
[5,2] = 2019
[1,3] = BA
[2,3] = 0.086000
[3,3] = 0.2600
[4,3] = 0.2700
[5,3] = 0.077000
[1,4] = OPS
[2,4] = 0.2200
[3,4] = 0.9410
[4,4] = 0.8500
[5,4] = 0.2200
[1,5] = RBI
[2,5] = 4
[3,5] = 120
[4,5] = 51
[5,5] = 3
[1,6] = OBP
[2,6] = 0.1170
[3,6] = 0.3580
[4,6] = 0.3360
[5,6] = 0.1430
}
From there on, you can collect that data into arrays or structs as you like and continue working. One nice option is Andrew Janke's nice 'tablicious' package:
octave:13> pkg load tablicious
octave:14> T = cell2table( C(2:end,:), 'VariableNames', C(1,:) );
octave:15> prettyprint(T)
-------------------------------------------------------
| Player | Year | BA | OPS | RBI | OBP |
-------------------------------------------------------
| SandyAlcantara | 2019 | 0.086 | 0.22 | 4 | 0.117 |
| PeteAlonso | 2019 | 0.26 | 0.941 | 120 | 0.358 |
| BrandonLowe | 2019 | 0.27 | 0.85 | 51 | 0.336 |
| MikeSoroka | 2019 | 0.077 | 0.22 | 3 | 0.143 |
-------------------------------------------------------
Upvotes: 2