Palindrom
Palindrom

Reputation: 443

Matlab processing data from text file

I try to read data from a text file. I can do it via import. It works fine. My data imported as: UserID|SportID|Rating

There are a lot of users that can like any sport with any rating for example:

User|SportID|Rating
1      2       10
1      3        5
2      1       10
2      3        2

I try to create a new matrix like below

UserID  Sport1  Sport2  Sport3
 1      (null)    10      5
 2        10    (null)    2

I tried to this via "for" and "loop" however there are almost 2000 user and 1000 sports and their data is almost 100000. How can I do this?

Upvotes: 3

Views: 134

Answers (3)

p8me
p8me

Reputation: 1860

I suppose you have already defined null as a number for simplification.

Null = -1; % or any other value which could not be a rating.

Considering:

nSports = 1000; % Number of sports
nUsers = 2000; % Number of users

Pre-allocate the result:

Rating_Mat = ones(nUsers, nSports) * Null; % Pre-allocation

Then use sub2ind (similar to this answer):

Rating_Mat (sub2ind([nUsers nSports], User, SportID) = Rating;

Or accumarray:

Rating_Mat = accumarray([User, SportID], Rating);

assuming that User and SportID are Nx1.

Hope it helps.

Upvotes: 1

Mauro
Mauro

Reputation: 1241

To do this fast, you can use a sparse matrix with one dimension UserID and the other Sports. The sparse matrix will behave for most things like a normal matrix. Construct it like so

out = sparse(User, SportID, Rating)

where User, SportID and Rating are the vectors corresponding to the columns of your text file.

Note 1: for duplicate of User and SportID the Rating will be summed.

Note 2: empty entries, as were written as (null) in the question are not stored in sparse matrices, only the non-zero ones (that is the main point of sparse matrices).

Upvotes: 2

James Mertz
James Mertz

Reputation: 8759

You can do the following:

% Test Input
inputVar = [1 2 10; 1 3 5; 2 1 10; 2 3 2]; 

% Determine number of users, and sports to create the new table
numSports = max(inputVar(1:end,2));
numUsers = max(inputVar(1:end,1));
newTable = NaN(numUsers, numSports);

% Iterate for each row of the new table (# of users)
for ii = 1:numUsers
    % Determine where the user rated from input mat, which sport he/she rated, and the rating
    userRating = find(inputVar(1:end,1) == ii);
    sportIndex = inputVar(userRating, 2)';
    sportRating = inputVar(userRating, 3)';
    newTable(ii, sportIndex) = sportRating; % Crete the new table based on the ratings.
end

newTable

Which produced the following:

newTable =

   NaN    10     5
    10   NaN     2

This would only have to run for the amount of users that are in your input table.

Upvotes: 1

Related Questions