MagikarpSama
MagikarpSama

Reputation: 323

How can I cut off the first row and first columns of a dataframe (headers, respectively)?

I am reading in some data about distances of cities with Pandas and I only need the distances as numbers for my calculation as a matrix. Panda does import everything fine but I still have the city names as Headers. This will be used for classic Multi-Dimensional Scaling.

My CSV (in a short version) looks like this:

"","Athens","Barcelona","Brussels"
"Athens",0,3313,2963
"Barcelona",3313,0,1318
"Brussels",2963,1318,0

Which is fine but in my function I only need the values like this:

0,3313,2963
3313,0,1318
2963,1318,0

I can't get only this matrix out of the CSV above. What can I do?

Upvotes: 0

Views: 211

Answers (3)

anky
anky

Reputation: 75080

You can use:

data=""""","Athens","Barcelona","Brussels"
"Athens",0,3313,2963
"Barcelona",3313,0,1318
"Brussels",2963,1318,0"""

df = pd.read_csv(pd.compat.StringIO(data),index_col=0) #replace pd.compat..() with filename
df.to_numpy() #df.values

array([[   0, 3313, 2963],
   [3313,    0, 1318],
   [2963, 1318,    0]], dtype=int64)

Upvotes: 1

Erfan
Erfan

Reputation: 42886

First we read your data in as csv, convert it to an array and slice of the first column:

df = pd.read_csv(a).to_numpy()[:, 1:]

array([[0, 3313, 2963],
       [3313, 0, 1318],
       [2963, 1318, 0]], dtype=object)

Note I read your csv in as a like following:

a = StringIO('''
"","Athens","Barcelona","Brussels"
"Athens",0,3313,2963
"Barcelona",3313,0,1318
"Brussels",2963,1318,0
''')

Upvotes: 1

iamklaus
iamklaus

Reputation: 3770

ok this is what we have

a = StringIO(""""","Athens","Barcelona","Brussels"
"Athens",0,3313,2963
"Barcelona",3313,0,1318
"Brussels",2963,1318,0""")

df = pd.read_csv(a,sep=',',engine='python')

print(df)

  Unnamed: 0  Athens  Barcelona  Brussels
0     Athens       0       3313      2963
1  Barcelona    3313          0      1318
2   Brussels    2963       1318         0

df.loc[:,'Athens':].values

Output

array([[   0, 3313, 2963],
       [3313,    0, 1318],
       [2963, 1318,    0]])

pd.to_csv

[i for i in df.loc[:,'Athens':].to_csv(header=None).split('\n') if i ]


['0,0,3313,2963', '1,3313,0,1318', '2,2963,1318,0']

Upvotes: 1

Related Questions