andreas-h
andreas-h

Reputation: 11100

Multidimensional data in Holoviews

I have a 4-D dataset (as xr.DataArray) with dimensions temperature, datasource, time, and altitude.

How can I create a scatter plot with of temperature(src0, z) vs. temperature(src1, z), so that I can select the altitude via a slider?

I'm currently having the problem that when I convert the data to a hv.Table, I have among others one column datasource and one column temperature, and I cannot figure out how to plot temperature(datasource=='src0') vs. temperature(datasource=='src1')


EDIT:

I try to clarify: I have a 4-D dataset DATA (which is a xr.DataArray) with dimensions data_variable, datasource, time, and altitude.

data_variable has 2 entries, temperature and humidity.

datasource has 2 entries, model and measurement

There are 6 altitudes and ~2000 times.

How can I create a scatter plot which has

such that altitude and data_variable can be selected with a slider?

Upvotes: 0

Views: 1025

Answers (1)

philippjfr
philippjfr

Reputation: 4080

If I'm understanding your question correctly you want to plot scatter values for temperature over time comparing between the two datasources and indexed by different altitudes?

# Load the data into a holoviews Dataset
ds = hv.Dataset(data_array)

# Create Scatter objects plotting time vs. temperature
# and group by altitude and datasource
scatter = ds.to(hv.Scatter, 'time', 'temperature',
                groupby=['altitude', 'datasource'], dynamic=True)

# Now overlay the datasource dimension and display
scatter.overlay('datasource')

Hopefully I understood your question correctly but based on this basic pattern you should be able to plot the data in whatever arrangement you want.

Edit: Based on your edit the main problem is that HoloViews expects each data_variable to be in a separate array, in pandas terms you need to do the equivalent as pd.melt.

# Define data array like yours
dataarray = xr.DataArray(np.random.rand(10, 10, 2, 2), name='variable',
                   coords=[('time', range(10)), ('altitude', range(10)),
                           ('datasource', ['model', 'measurement']),
                           ('data_variable', ['humidity', 'temperature'])])

# Groupby datasource and data_variable, combining the resultant array into a Dataset with 4 data variables
group_dims = ['datasource', 'data_variable']
grouped = hv.Dataset(dataarray, datatype=['xarray']).groupby(group_dims)
dataset = xr.merge([da.data.rename({'variable': ' '.join(key)}).drop(group_dims)
                    for key, da in grouped.items()])

ds = hv.Dataset(dataset)
scatter = ds.to(hv.Scatter, 'model temperature', 'measurement temperature', 'altitude')

Note however that while testing this I ran into a bug, which I've now opened a PR for (see here)

Upvotes: 1

Related Questions