Reputation: 55
from google.colab import drive
drive.mount('/content/drive')
% cd/content/drive/My Drive/data
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv("d.csv", dtype=str, sep='\t')
print(data)
column1 = data["Num_c"]
column2 = data["Num_d"]
x = np.array(column1)
y = np.array(column2)
np.where(y < 0)
So I have two arrays, I want to use np.where() but it says
TypeError: '<' not supported between instances of 'str' and 'int'.
How do I convert it to int?
Also, how do I delete the first 5 data from both arrays with np.where()?
update:
print(x)
# [1] ['48' '65' '124' '201' '294' '443' '574' '833' '1290' '1978' '2747' '4518' '5977' '7714' '9695' '11794' '14383' '17072' '20050' '23282' '26767' '30506']
print(y)
# [2] ['0' '0' '0' '4' '7' '10' '18' '26' '42' '57' '81' '107' '133' '171' '214''260' '305' '355' '407' '461' '518' '577']
update2:
I had to use dtype= str
becuase the first column in the file is a string
Upvotes: 1
Views: 6922
Reputation: 5395
I just wanted to share this tip. A good way to convert the tables posted in questions to a dataframe quickly. As you can see, specifying the data types wasn't necessary - I think the magic was adding the sep=r'\s+'
parameter to read_csv()
>>> csvfile = io.StringIO("""
... Date Num_c Num_d
... 0 2020-01-16 48 0
... 1 2020-01-17 65 0
... 2 2020-01-18 124 0
... 3 2020-01-19 201 4
... 4 2020-01-20 294 7
... 5 2020-01-21 443 10
... 6 2020-01-22 574 18
... 7 2020-01-23 833 26
... 8 2020-01-24 1290 42
... 9 2020-01-25 1978 57
... 10 2020-01-26 2747 81
... 11 2020-01-27 4518 107
... 12 2020-01-28 5977 133
... 13 2020-01-29 7714 171
... 14 2020-01-30 9695 214
... 15 2020-01-31 11794 260
... 16 2020-02-01 14383 305
... 17 2020-02-02 17072 355
... 18 2020-02-03 20050 407
... 19 2020-02-04 23282 461
... 20 2020-02-05 26767 518
... 21 2020-02-06 30506 577
... """)
>>> data = pd.read_csv(csvfile, sep=r'\s+')
The contents of the dataframe:
>>> data
Date Num_c Num_d
0 2020-01-16 48 0
1 2020-01-17 65 0
2 2020-01-18 124 0
3 2020-01-19 201 4
4 2020-01-20 294 7
5 2020-01-21 443 10
6 2020-01-22 574 18
7 2020-01-23 833 26
8 2020-01-24 1290 42
9 2020-01-25 1978 57
10 2020-01-26 2747 81
11 2020-01-27 4518 107
12 2020-01-28 5977 133
13 2020-01-29 7714 171
14 2020-01-30 9695 214
15 2020-01-31 11794 260
16 2020-02-01 14383 305
17 2020-02-02 17072 355
18 2020-02-03 20050 407
19 2020-02-04 23282 461
20 2020-02-05 26767 518
21 2020-02-06 30506 577
The types of the data:
>>> data.dtypes
Date object
Num_c int64
Num_d int64
dtype: object
Upvotes: 1
Reputation: 678
When initializing y, use the int
dtype
.
y = np.array(column2, dtype=int)
Regarding deleting the from np.array
, you can splice them just like normal lists, so y[5:]
will give you a new array without the first 5 elements. You can do this at initialization as well
x = np.array(column1)[5:]
y = np.array(column2)[5:]
Upvotes: 2