Reputation: 609
I'm learning how to use sklearn and scikit and all that to do some machine learning.
I was wondering how to import this as data?
This is a dataset from the million song genre dataset.
How can I make my data.target[0]
equal to "classic pop and rock" (as 0) and data.target[1]
equal to 0 which is "classic pop and rock" and data.target[640]
equal to 1 which is "folk"?
And my data.data[0,:]
be equal to -8.697
, 155.007
, 1, 9, and so forth (all numerical values after the title column)
Upvotes: 2
Views: 2575
Reputation: 1567
as others had mentioned it was a little unclear as to what shape you were looking for, but just as a general starter, and getting the data into a very flexible format, you could read the text file into python and convert it to a pandas dataframe. I am certain their are other more compact ways of doing this, but just to provide clear steps we could start with:
import pandas as pd
import re
file = 'filepath' #this is the file path to the saved text file
music = open(file, 'r')
lines = music.readlines()
# split the lines by comma
lines = [line.split(',') for line in lines]
# capturing the column line
columns = lines[9]
# capturing the actual content of the data, and dismissing the header info
content = lines[10:]
musicdf = pd.DataFrame(content)
# assign the column names to our dataframe
musicdf.columns = columns
# preview the dataframe
musicdf.head(10)
# the final column had formatting issues, so wanted to provide code to get rid of the "\n" in both the column title and the column values
def cleaner(txt):
txt = re.sub(r'[\n]+', '', txt)
return txt
# rename the column of issue
musicdf = musicdf.rename(columns = {'var_timbre12\n' : 'var_timbre12'})
# applying the column cleaning function above to the column of interest
musicdf['var_timbre12'] = musicdf['var_timbre12'].apply(lambda p: cleaner(p))
# checking the top and bottom of dataframe for column var_timbre12
musicdf['var_timbre12'].head(10)
musicdf['var_timbre12'].tail(10)
the result of this would be the following:
%genre track_id artist_name
0 classic pop and rock TRFCOOU128F427AEC0 Blue Oyster Cult
1 classic pop and rock TRNJTPB128F427AE9F Blue Oyster Cult
By having the data in this format, you can now do lots of grouping tasks, finding certain genres and their relative attributes, etc. using pandas groupby function.
Hope this helps!
Upvotes: 2