Reputation: 323
I am reading in some data about distances of cities with Pandas and I only need the distances as numbers for my calculation as a matrix. Panda does import everything fine but I still have the city names as Headers. This will be used for classic Multi-Dimensional Scaling.
My CSV (in a short version) looks like this:
"","Athens","Barcelona","Brussels"
"Athens",0,3313,2963
"Barcelona",3313,0,1318
"Brussels",2963,1318,0
Which is fine but in my function I only need the values like this:
0,3313,2963
3313,0,1318
2963,1318,0
I can't get only this matrix out of the CSV above. What can I do?
Upvotes: 0
Views: 211
Reputation: 75080
You can use:
data=""""","Athens","Barcelona","Brussels"
"Athens",0,3313,2963
"Barcelona",3313,0,1318
"Brussels",2963,1318,0"""
df = pd.read_csv(pd.compat.StringIO(data),index_col=0) #replace pd.compat..() with filename
df.to_numpy() #df.values
array([[ 0, 3313, 2963],
[3313, 0, 1318],
[2963, 1318, 0]], dtype=int64)
Upvotes: 1
Reputation: 42886
First we read your data in as csv, convert it to an array and slice of the first column:
df = pd.read_csv(a).to_numpy()[:, 1:]
array([[0, 3313, 2963],
[3313, 0, 1318],
[2963, 1318, 0]], dtype=object)
Note I read your csv in as a
like following:
a = StringIO('''
"","Athens","Barcelona","Brussels"
"Athens",0,3313,2963
"Barcelona",3313,0,1318
"Brussels",2963,1318,0
''')
Upvotes: 1
Reputation: 3770
ok this is what we have
a = StringIO(""""","Athens","Barcelona","Brussels"
"Athens",0,3313,2963
"Barcelona",3313,0,1318
"Brussels",2963,1318,0""")
df = pd.read_csv(a,sep=',',engine='python')
print(df)
Unnamed: 0 Athens Barcelona Brussels
0 Athens 0 3313 2963
1 Barcelona 3313 0 1318
2 Brussels 2963 1318 0
df.loc[:,'Athens':].values
Output
array([[ 0, 3313, 2963],
[3313, 0, 1318],
[2963, 1318, 0]])
pd.to_csv
[i for i in df.loc[:,'Athens':].to_csv(header=None).split('\n') if i ]
['0,0,3313,2963', '1,3313,0,1318', '2,2963,1318,0']
Upvotes: 1