Reputation: 3675

how to load data and store the data from a file using numpy

I have the following file like this:

2 qid:1 1:0.32 2:0.50 3:0.78 4:0.02 10:0.90
5 qid:2 2:0.22 5:0.34 6:0.87 10:0.56 12:0.32 19:0.24 20:0.55
...

he structure is follwoing like that:

output={} rel=2 qid=1 features={} # the feature list "1:0.32 2:0.50 3:0.78 4:0.02 10:0.90" output.append([rel,qid,features]) ... How can I write my python code to load the data, thanks

Upvotes: 0

Answers (3)

lmjohns3

Reputation: 7592

It looks like your input files are in svmlight format. If this is true, then there's a parser included as part of scikit-learn that might be handy to use -- see the source at:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/svmlight_format.py#L32

Upvotes: 0

CTKlein

Reputation: 309

The following should work nicely and leaves your data in a handy format:

regexp = r"(\d+)\s+qid:(\d+)\s+(.+)"
data = np.fromregex(file_name, regexp, 
                    dtype=[('rel', int), ('qid', int), ('features', object)])

From here you can select rel, qid or features by calling:

>>> data['rel']
array([2, 5])
>>> data['qid']
array([1, 2])
>>> data['features']
array(['1:0.32 2:0.50 3:0.78 4:0.02 10:0.90',
       '2:0.22 5:0.34 6:0.87 10:0.56 12:0.32 19:0.24 20:0.55'], dtype=object)

Upvotes: 0

osdf

Reputation: 818

For reading use something like this (data is in file 'fname'):

f = open(fname)
lines = f.readlines(f)
for line in lines:
    elements = line.split(' ')
    rel = int(elements[0])
    qid = int(elements[1].split(':')[1])
    featurelist = elements[2:]
    # get the various features again with splitting at ':'
    # you get the idea ...

Upvotes: 1

how to load data and store the data from a file using numpy

Answers (3)

Related Questions