Reputation: 47
I am working with two datasets in csv
form (movielens latest-small dataset). Given below are the fields of both.
rating.csv
user_id movie_id rating
movie.csv
movie_id movie_name
what I want is to combine them into a single .csv with following fields
user_id movie_id movie_name rating
So that the common column movie_id
maps with corresponding movie_name
.
Could that be done using Excel? If not, how can I do it?
I just need it as a dataset for my recommender engine, so any simple solution is welcome as end result is all that matters. But since I've some experience in java so that would be easy for my easy understand and implement.
If there is some way using Excel then that would be the best. I have tried searching online and found some VLOOKUP
method but couldn't clearly get it.
Also I tried some online merging tools but they just attached the sheets one after the another not mapping the column. So I have no problem using online tools too.
Upvotes: 0
Views: 7882
Reputation: 5529
Check out this tool - https://github.com/DataFoxCo/gocsv - it's based off of csvkit but has a ton of additional features. One of our engineers custom built it - and open sourced it to help solve some of these data issues we deal with every day :)
It will do a vlookup essentially of any sized csv in merely seconds using the join command:
gocsv join --columns 'movie_id','movie_id' --left rating.csv movie.csv > combineddata.csv
then if you still want to reorder the columns, you can do that too:
gocsv select --columns 'user_id','movie_id','movie_name','rating' combineddata.csv > combineddata-final.csv
I split the commands out up top to help explain its use - the documentation has all the examples on it also but ultimately I would really recommend pipelining it and doing it in one command like this:
cat rating.csv \
| gocsv join --left --columns 'movie_id','movie_id' movie.csv \
| gocsv select --columns 'user_id','movie_id','movie_name','rating' > combineddata.csv
Upvotes: 0
Reputation: 19289
This is the method with a VLOOKUP
formula within Excel:
The formula takes 4 arguments:
See here for documentation on the function.
Upvotes: 1