Shubham
Shubham

Reputation: 15

Why I am getting error "Duplicate names are not allowed"?

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np


u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=u_cols, encoding='latin-1')


r_cols = ['user_id','movie_id','rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep="\t", names=r_cols, encoding='latin-1')

Upvotes: 1

Views: 15534

Answers (5)

Habib Khan
Habib Khan

Reputation: 21

You do not need to change the version, your problem will be resolved just copy paste the given code:

FOR TRAIN SLOT

X_train = pd.read_csv('../UCI_HAR_Dataset/train/X_train.txt',
delim_whitespace=True, header=None, encoding='latin-1')
X_train.columns = features

FOR TEST SLOT

X_test = pd.read_csv('UCI-HAR-Dataset/test/X_test.txt', 
delim_whitespace=True, header=None, encoding='latin-1')
X_test.columns = features

Upvotes: 2

praveen kumar bommali
praveen kumar bommali

Reputation: 145

we can able to resolve issue like this, no need of version change.

X_train = pd.read_csv('../UCI_HAR_Dataset/train/X_train.txt', delim_whitespace=True, header=None, encoding='latin-1')
X_train.columns = features

Upvotes: 1

shivesh kumar
shivesh kumar

Reputation: 85

Try this version of pandas

pip install pandas==0.20.0

Upvotes: 1

Gareth Jones
Gareth Jones

Reputation: 21

I have to movielens dataset to hand, but don't get any error using your code to load it:

u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=u_cols, encoding='latin-1')

r_cols = ['user_id','movie_id','rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep="\t", names=r_cols, encoding='latin-1')

users.head()
Out[36]: 
   user_id  age sex  occupation zip_code
0        1   24   M  technician    85711
1        2   53   F       other    94043
2        3   23   M      writer    32067
3        4   24   M  technician    43537
4        5   33   F       other    15213

ratings.head()
Out[37]: 
   user_id  movie_id  rating  unix_timestamp
0      196       242       3       881250949
1      186       302       3       891717742
2       22       377       1       878887116
3      244        51       2       880606923
4      166       346       1       886397596

Upvotes: 0

Aryerez
Aryerez

Reputation: 3495

It can be that the csv file itself has duplications in the column names.

Upvotes: 0

Related Questions