Rocky
Rocky

Reputation: 105

Tweaking numpy array based on conditions

I am having weird problem. I have a numpy array which contain data corresponding to different dates (in date list). I also have separate list with truncate date in it for each row. Now, I need to replace the value in numpy array with NaN, if the date is less than truncate date for that row. Example below.

import numpy as np    
date = ['01-05-2020', '02-05-2020', '03-05-2020', '04-05-2020', '05-05-2020', '06-05-2020', '07-05-2020', '08-05-2020', '09-05-2020', '10-05-2020']
a = np.random.rand(4,10)
truncate_date = ['01-05-2020', '04-05-2020', '06-05-2020', '06-05-2020']

My Output a would look like:

([[0.954637 0.403668    0.63196 0.143053    0.86481 0.119429    0.266624    0.672866    0.902944    0.241125]
[np.NaN np.NaN  np.NaN  0.0207699   0.165715    0.0354149   0.944116    0.759993    0.942923    0.56149]
[np.NaN np.NaN  np.NaN  np.NaN      np.NaN      0.65055 0.948541    0.256155    0.207642    0.600534]
[np.NaN np.NaN  np.NaN  np.NaN      np.NaN     0.431788 0.387213    0.285412    0.770842    0.657336]])

Unfortunately, I am clueless to approach. Not sure if this can be done.

Upvotes: 2

Views: 81

Answers (2)

V. Ayrat
V. Ayrat

Reputation: 2729

Pure numpy solution

import numpy as np
import datetime

date = [
    "01-05-2020",
    "02-05-2020",
    "03-05-2020",
    "04-05-2020",
    "05-05-2020",
    "06-05-2020",
    "07-05-2020",
    "08-05-2020",
    "09-05-2020",
    "10-05-2020",
]
a = np.random.rand(4, 10)
truncate_date = ["01-05-2020", "04-05-2020", "06-05-2020", "06-05-2020"]


date_in_datetime_format = np.array(
    [datetime.datetime.strptime(s, "%d-%m-%Y") for s in date]
)
truncate_date_in_datetime_format = np.array(
    [datetime.datetime.strptime(s, "%d-%m-%Y") for s in truncate_date]
)
nan_indices = np.greater.outer(
    truncate_date_in_datetime_format, date_in_datetime_format
)
a[nan_indices] = np.nan

Upvotes: 2

gust
gust

Reputation: 945

Using your syntax:

import numpy as np    
import pandas as pd
date_list = ['01-05-2020', '02-05-2020', '03-05-2020', '04-05-2020', '05-05-2020', '06-05-2020', '07-05-2020', '08-05-2020', '09-05-2020', '10-05-2020']
date_list = pd.to_datetime(date_list)
truncate_date_list = ['01-05-2020', '04-05-2020', '06-05-2020', '06-05-2020']
truncate_date_list = pd.to_datetime(truncate_date)
value_matrix = np.random.rand(4,10)

def vals_if_date_not_truncated(date_list, truncate_date_list,
                               value_matrix):
    results = []
    for value_row, truncate_date in zip(value_matrix, truncate_date_list):
        row = []
        for value, date in zip(value_row, date_list):
            if truncate_date <= date:
                row.append(value)
            else:
                row.append(np.NaN)
        results.append(row)
    return np.array(results)

results = vals_if_date_not_truncated(date_list, truncate_date_list, value_matrix)

print(results)
[[0.6085591  0.29623597 0.48222885 0.03307028 0.87412752 0.28812138
  0.10314832 0.63060118 0.58139836 0.47499239]
 [       nan        nan        nan 0.53583195 0.06113442 0.15332923
  0.24596896 0.97465439 0.64973568 0.83442661]
 [       nan        nan        nan        nan        nan 0.64793026
  0.77396558 0.58411891 0.31994605 0.50118944]
 [       nan        nan        nan        nan        nan 0.2483622
  0.06314673 0.12511539 0.02691487 0.57909995]]

pandas is great for converting strings to dates and comparing between the two dates.

zip is used to iterate through two or more items at once in a for loop.

Hope this helps.

Upvotes: 1

Related Questions