Reputation: 15
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=u_cols, encoding='latin-1')
r_cols = ['user_id','movie_id','rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep="\t", names=r_cols, encoding='latin-1')
Upvotes: 1
Views: 15534
Reputation: 21
You do not need to change the version, your problem will be resolved just copy paste the given code:
FOR TRAIN SLOT
X_train = pd.read_csv('../UCI_HAR_Dataset/train/X_train.txt',
delim_whitespace=True, header=None, encoding='latin-1')
X_train.columns = features
FOR TEST SLOT
X_test = pd.read_csv('UCI-HAR-Dataset/test/X_test.txt',
delim_whitespace=True, header=None, encoding='latin-1')
X_test.columns = features
Upvotes: 2
Reputation: 145
we can able to resolve issue like this, no need of version change.
X_train = pd.read_csv('../UCI_HAR_Dataset/train/X_train.txt', delim_whitespace=True, header=None, encoding='latin-1')
X_train.columns = features
Upvotes: 1
Reputation: 21
I have to movielens dataset to hand, but don't get any error using your code to load it:
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=u_cols, encoding='latin-1')
r_cols = ['user_id','movie_id','rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep="\t", names=r_cols, encoding='latin-1')
users.head()
Out[36]:
user_id age sex occupation zip_code
0 1 24 M technician 85711
1 2 53 F other 94043
2 3 23 M writer 32067
3 4 24 M technician 43537
4 5 33 F other 15213
ratings.head()
Out[37]:
user_id movie_id rating unix_timestamp
0 196 242 3 881250949
1 186 302 3 891717742
2 22 377 1 878887116
3 244 51 2 880606923
4 166 346 1 886397596
Upvotes: 0
Reputation: 3495
It can be that the csv file itself has duplications in the column names.
Upvotes: 0