Reputation: 1554
I need to read in a .csv file which contains a distance matrix, so it has identical row names and column names, and it's important to have them both. However, the code below can only get me a dataframe where row names are included in an extra "Unnamed: 0" column and the index become integers again, which is very inconvenient for the indexing later.
DATA = pd.read_csv("https://raw.githubusercontent.com/PawinData/UC/master/DistanceMatrix_shortestnetworks.csv")
I did check the documentation of pandas.read_csv
and played with index_col
, header
, names
, e.t.c but none seemed to work. Can anybody help me out here?
Upvotes: 1
Views: 4075
Reputation: 389
This issue most likely exhibits because your CSV was saved along with its RangeIndex, which usually doesn't have a name. The fix would actually need to be done when saving the DataFrame data.to_csv('file.csv', index = False)
To read the unnamed column as the index. Specify an index_col=0 argument to pd.read_csv, this reads in the first column as the index.
data = pd.read_csv("https://raw.githubusercontent.com/PawinData/UC/master/DistanceMatrix_shortestnetworks.csv",index_col = 0)
And to drop the unnamed column use data.drop(data.filter(regex="Unname"),axis=1, inplace=True)
Upvotes: 1
Reputation: 862671
Use index_col=0
parameter for first column to index:
url = "https://raw.githubusercontent.com/PawinData/UC/master/DistanceMatrix_shortestnetworks.csv"
DATA = pd.read_csv(url, index_col=0)
print (DATA.head())
Imperial Kern Los Angeles Orange Riverside San Bernardino \
Imperial 0 3 3 2 1 2
Kern 3 0 1 2 2 1
Los Angeles 3 1 0 1 2 1
Orange 2 2 1 0 1 1
Riverside 1 2 2 1 0 1
San Diego San Luis Obispo Santa Barbara Ventura
Imperial 1 4 4 4
Kern 3 1 1 1
Los Angeles 2 2 2 1
Orange 1 3 3 2
Riverside 1 3 3 3
Upvotes: 1