Kelouis
Kelouis

Reputation: 55

Python. TypeError: '<' not supported between instances of 'str' and 'int'

from google.colab import drive
drive.mount('/content/drive')
% cd/content/drive/My Drive/data

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data = pd.read_csv("d.csv", dtype=str, sep='\t')

print(data)

column1 = data["Num_c"]
column2 = data["Num_d"]

x = np.array(column1)
y = np.array(column2)

np.where(y < 0)

So I have two arrays, I want to use np.where() but it says

TypeError: '<' not supported between instances of 'str' and 'int'.

How do I convert it to int?

Also, how do I delete the first 5 data from both arrays with np.where()?

update:

print(x) 
# [1] ['48' '65' '124' '201' '294' '443' '574' '833' '1290' '1978' '2747' '4518' '5977' '7714' '9695' '11794' '14383' '17072' '20050' '23282' '26767' '30506']

print(y)
# [2] ['0' '0' '0' '4' '7' '10' '18' '26' '42' '57' '81' '107' '133' '171' '214''260' '305' '355' '407' '461' '518' '577']

update2: I had to use dtype= str becuase the first column in the file is a string

Upvotes: 1

Views: 6922

Answers (3)

Todd
Todd

Reputation: 5395

I just wanted to share this tip. A good way to convert the tables posted in questions to a dataframe quickly. As you can see, specifying the data types wasn't necessary - I think the magic was adding the sep=r'\s+' parameter to read_csv()

>>> csvfile = io.StringIO("""
...           Date     Num_c      Num_d
... 0   2020-01-16        48          0
... 1   2020-01-17        65          0
... 2   2020-01-18       124          0
... 3   2020-01-19       201          4
... 4   2020-01-20       294          7
... 5   2020-01-21       443         10
... 6   2020-01-22       574         18
... 7   2020-01-23       833         26
... 8   2020-01-24      1290         42
... 9   2020-01-25      1978         57
... 10  2020-01-26      2747         81
... 11  2020-01-27      4518        107
... 12  2020-01-28      5977        133
... 13  2020-01-29      7714        171
... 14  2020-01-30      9695        214
... 15  2020-01-31     11794        260
... 16  2020-02-01     14383        305
... 17  2020-02-02     17072        355
... 18  2020-02-03     20050        407
... 19  2020-02-04     23282        461
... 20  2020-02-05     26767        518
... 21  2020-02-06     30506        577
... """)
>>> data = pd.read_csv(csvfile, sep=r'\s+')

The contents of the dataframe:

>>> data
          Date  Num_c  Num_d
0   2020-01-16     48      0
1   2020-01-17     65      0
2   2020-01-18    124      0
3   2020-01-19    201      4
4   2020-01-20    294      7
5   2020-01-21    443     10
6   2020-01-22    574     18
7   2020-01-23    833     26
8   2020-01-24   1290     42
9   2020-01-25   1978     57
10  2020-01-26   2747     81
11  2020-01-27   4518    107
12  2020-01-28   5977    133
13  2020-01-29   7714    171
14  2020-01-30   9695    214
15  2020-01-31  11794    260
16  2020-02-01  14383    305
17  2020-02-02  17072    355
18  2020-02-03  20050    407
19  2020-02-04  23282    461
20  2020-02-05  26767    518
21  2020-02-06  30506    577

The types of the data:

>>> data.dtypes
Date     object
Num_c     int64
Num_d     int64
dtype: object

Upvotes: 1

syfluqs
syfluqs

Reputation: 678

When initializing y, use the int dtype.

y = np.array(column2, dtype=int)

Regarding deleting the from np.array, you can splice them just like normal lists, so y[5:] will give you a new array without the first 5 elements. You can do this at initialization as well

x = np.array(column1)[5:]
y = np.array(column2)[5:]

Upvotes: 2

Renaud
Renaud

Reputation: 2819

Did you tried to us astype on y:

y.astype(int)

Upvotes: 0

Related Questions