alextc
alextc

Reputation: 3515

xarray - select/index DataArray from the time labels from another DataArray

I have two DataArray objects, called "A" and "B".

Besides Latitude and Longitude, both of them have a time dimension denoting daily data. A has a smaller time coordinates than B.

A's time dimension:

<xarray.DataArray 'time' (time: 1422)>
array(['2015-03-30T00:00:00.000000000', '2015-06-14T00:00:00.000000000',
       '2015-06-16T00:00:00.000000000', ..., '2019-08-31T00:00:00.000000000',
       '2019-09-01T00:00:00.000000000', '2019-09-02T00:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2015-03-30 2015-06-14 ... 2019-09-02

B's time dimension:

<xarray.DataArray 'time' (time: 16802)>
array(['1972-01-01T00:00:00.000000000', '1972-01-02T00:00:00.000000000',
       '1972-01-03T00:00:00.000000000', ..., '2017-12-29T00:00:00.000000000',
       '2017-12-30T00:00:00.000000000', '2017-12-31T00:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1972-01-01 1972-01-02 ... 2017-12-31

Obviously, the A's time dimension is a subset of B's time dimension. I would like to select data from B using the all the time labels from A. As the time in A is not continuous I don't think slice is suitable. So I tried using sel.

B_sel = B.sel(time=A.time)

I received an error: KeyError: "not all values found in index 'time'"

Upvotes: 2

Views: 4989

Answers (2)

春涯行雨
春涯行雨

Reputation: 41

A_new = A.where(A.time.isin(B.time), drop=True)

http://xarray.pydata.org/en/stable/user-guide/indexing.html

Upvotes: 4

Light_B
Light_B

Reputation: 1800

Obviously, the A's time dimension is a subset of B's time dimension.

I received an error: KeyError: "not all values found in index 'time'"

The error message is suggestive in itself that the assumption made in statement one is wrong. Also, if you look at your time values carefully A has values until 2019 whereas B ends in 2017.

So, there are 2 ways to solve this:

  1. If you're sure that A has all the values in B up till 2017 then

    sel_dates = A.time.values[A.time.dt.year < 2017]
    B_sel = B.sel(time=sel_dates)
    
  2. If you're not sure about the values in A being continuous because of some unexpected values in somewhere then you can perform an element-wise check using np.isin() which is one of the speed-optimised numpy functions

    sel_dates = A.time.values[np.isin(A.time.values, B.time.values)]
    
     ## example ##
     ## dates1 is an array of daily dates of 1 month
     dates1 = np.arange('2005-02', '2005-03', dtype='datetime64[D]')
     dates2 = np.array(['2005-02-03', '2002-02-05', '2000-01-05'], dtype='datetime64')
     # checking for dates2 which are a part of dates 1
     print(np.isin(dates2, dates1))
     >>array([ True, False, False])
    

Upvotes: 0

Related Questions