Reputation: 105
I am having weird problem. I have a numpy array which contain data corresponding to different dates (in date list). I also have separate list with truncate date in it for each row. Now, I need to replace the value in numpy array with NaN, if the date is less than truncate date for that row. Example below.
import numpy as np
date = ['01-05-2020', '02-05-2020', '03-05-2020', '04-05-2020', '05-05-2020', '06-05-2020', '07-05-2020', '08-05-2020', '09-05-2020', '10-05-2020']
a = np.random.rand(4,10)
truncate_date = ['01-05-2020', '04-05-2020', '06-05-2020', '06-05-2020']
My Output a would look like:
([[0.954637 0.403668 0.63196 0.143053 0.86481 0.119429 0.266624 0.672866 0.902944 0.241125]
[np.NaN np.NaN np.NaN 0.0207699 0.165715 0.0354149 0.944116 0.759993 0.942923 0.56149]
[np.NaN np.NaN np.NaN np.NaN np.NaN 0.65055 0.948541 0.256155 0.207642 0.600534]
[np.NaN np.NaN np.NaN np.NaN np.NaN 0.431788 0.387213 0.285412 0.770842 0.657336]])
Unfortunately, I am clueless to approach. Not sure if this can be done.
Upvotes: 2
Views: 81
Reputation: 2729
Pure numpy solution
import numpy as np
import datetime
date = [
"01-05-2020",
"02-05-2020",
"03-05-2020",
"04-05-2020",
"05-05-2020",
"06-05-2020",
"07-05-2020",
"08-05-2020",
"09-05-2020",
"10-05-2020",
]
a = np.random.rand(4, 10)
truncate_date = ["01-05-2020", "04-05-2020", "06-05-2020", "06-05-2020"]
date_in_datetime_format = np.array(
[datetime.datetime.strptime(s, "%d-%m-%Y") for s in date]
)
truncate_date_in_datetime_format = np.array(
[datetime.datetime.strptime(s, "%d-%m-%Y") for s in truncate_date]
)
nan_indices = np.greater.outer(
truncate_date_in_datetime_format, date_in_datetime_format
)
a[nan_indices] = np.nan
Upvotes: 2
Reputation: 945
Using your syntax:
import numpy as np
import pandas as pd
date_list = ['01-05-2020', '02-05-2020', '03-05-2020', '04-05-2020', '05-05-2020', '06-05-2020', '07-05-2020', '08-05-2020', '09-05-2020', '10-05-2020']
date_list = pd.to_datetime(date_list)
truncate_date_list = ['01-05-2020', '04-05-2020', '06-05-2020', '06-05-2020']
truncate_date_list = pd.to_datetime(truncate_date)
value_matrix = np.random.rand(4,10)
def vals_if_date_not_truncated(date_list, truncate_date_list,
value_matrix):
results = []
for value_row, truncate_date in zip(value_matrix, truncate_date_list):
row = []
for value, date in zip(value_row, date_list):
if truncate_date <= date:
row.append(value)
else:
row.append(np.NaN)
results.append(row)
return np.array(results)
results = vals_if_date_not_truncated(date_list, truncate_date_list, value_matrix)
print(results)
[[0.6085591 0.29623597 0.48222885 0.03307028 0.87412752 0.28812138
0.10314832 0.63060118 0.58139836 0.47499239]
[ nan nan nan 0.53583195 0.06113442 0.15332923
0.24596896 0.97465439 0.64973568 0.83442661]
[ nan nan nan nan nan 0.64793026
0.77396558 0.58411891 0.31994605 0.50118944]
[ nan nan nan nan nan 0.2483622
0.06314673 0.12511539 0.02691487 0.57909995]]
pandas
is great for converting strings to dates and comparing between the two dates.
zip
is used to iterate through two or more items at once in a for loop.
Hope this helps.
Upvotes: 1