YAXArraysToolbox
Documentation for YAXArraysToolbox.
YAXArraysToolbox.aggregate_time
YAXArraysToolbox.altitude_mask_results_proc
YAXArraysToolbox.altitude_masking_proc
YAXArraysToolbox.masking_proc
YAXArraysToolbox.masking_space
YAXArraysToolbox.masking_time
YAXArraysToolbox.plot_space
YAXArraysToolbox.plot_time
YAXArraysToolbox.s4time
YAXArraysToolbox.space4time_proc
YAXArraysToolbox.spacetime_folds
YAXArraysToolbox.aggregate_time
— MethodAggregate by time
Arguments:
cube_in
YAXArray Cube.time_axis
: String. Name of the time axis.new_resolution
: String. New temporal resolution can be"day"
,"month"
,"year"
.new_time_step
: Int64. Time step to be computed in the new time series. e.g.new_resolution="day", new_time_step=8
will compute the function each 8 days. The new time dimension will only contain the days corresponding to the 8th day.fun
: String. Function to be applied to aggregate the time. It can be "median", "mean", "std", "var", "sum", "quant", "min", "max".p
: Float64 in the interval [0,1]. Iffun=quant
p is the value of the quantile.skipMissing
: Boolean. Skip missing values when aggregating the data. If all values are missing, NaN is returned.skipnan
: Boolean. Skip NaN values when aggregating the data. If all values are NaN, NaN is returned.showprog
: Boolean. Progress Bar.max_cache
: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
esds = open_dataset("https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-184x90x90-2.1.1.zarr")
esdc = Cube(esds)
# Estimating the monthly LAI
lai_month = aggregate_time(esdc[Variable = "leaf_area_index"]; time_axis = "time", new_resolution = "month", new_time_step=1, fun="mean", p=nothing, skipMissing=true, skipnan=true, showprog=true, max_cache="1GB")
YAXArraysToolbox.altitude_mask_results_proc
— MethodTopographical variability processor
Arguments:
cube_in_altitude
: Altitude YAXARRAY with two variables mean, and sd.lon_axis_name
: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lon"
lat_axis_name
: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lat"
variable_name
: String. Name of the Variable containing the variables "mean", and "sd".winsize
: Edge size of the moving window on pixels. By default winsize = 5. E.g.winsize = 5
will produce a moving window with 5^2 pixels.showprog
: Show progress bar. By defaultshowprog = true
Output:
The Topographical variability processor produces a YAXARRAY.cube with three Indicators:
v1
: $v_{1}=\dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i}$
High values of $v_{1}$ indicate hilly terrain over the considered scale, which should be discarded from the analysis.
v2
: $v_{2}=\vert \mu_{h} - \dfrac{1}{n} \sum_{i = 1}^{n}\mu_{h,i} \vert$
$v_{2}$ indicates how different the mean elevation within the central pixel is from the average elevation in the local window.
v3
: $v_{3}= \vert \sigma_{h} - \dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i} \vert$
$v_{3}$: compares the central pixel's standard deviation of elevation with the standard deviation across the moving window.
See also:
altitude_masking_proc
TO ADD LINK!!
Bibliography:
- Duveiller, G., Hooker, J., & Cescatti, A. (2018). A dataset mapping the potential biophysical effects of vegetation cover change. Scientific Data, 5(1), Article 1. https://doi.org/10.1038/sdata.2018.14
YAXArraysToolbox.altitude_masking_proc
— MethodTopographical masking processor
Arguments:
cube_in_to_mask
: YAXArray Cube to be masked.cube_in_altitude
: Altitude YAXARRAY with two variables mean, and sd.lon_axis_name
: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lon"
lat_axis_name
: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lat"
variable_name
: String. Name of the Variable containing the variables "mean", and "sd".time_axis_name
: String or NaN. It is strongly recommended to pass this parameter if the cube to be masked contains a time dimension, otherwisenothing
.winsize
: Edge size of the moving window on pixels. By defaultwinsize = 5
. E.g.winsize = 5
will produce a moving window with 5^2 pixels.v1_thr
: Float. Threshold to mask values using $v_{1}$ indicator. All values higer or equal tov1_thr
are set to NaN. By defaultv1_thr = 50
v2_thr
: Float. Threshold to mask values using $v_{2}$ indicator. All values higer or equal tov2_thr
are set to NaN. By detaultv2_thr = 100
v3_thr
: Float. Threshold to mask values using $v_{3}$ indicator. All values higer or equal tov3
are set to NaN. By defaultv3_thr = 100
showprog
: Boolean. Show progress bar. By defaultshowprog = true
.
Output:
- YAXArray Datase with two variables:
cube masked
: YAXArray Cube with same dimensions ascube_in_to_mask
.masked_pixels
: YAXArray Cube with same lat, lon, dimensions as cube_masked but with a single boolean variable indicating if the pixel was masked or not.
Topographical variability indicators
v1
: $v_{1}=\dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i}$
High values of $v_{1}$ indicate hilly terrain over the considered scale, which should be discarded from the analysis.
v2
: $v_{2}=\vert \mu_{h} - \dfrac{1}{n} \sum_{i = 1}^{n}\mu_{h,i} \vert$
$v_{2}$ indicates how different the mean elevation within the central pixel is from the average elevation in the local window.
v3
: $v_{3}= \vert \sigma_{h} - \dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i} \vert$
$v_{3}$: compares the central pixel's standard deviation of elevation with the standard deviation across the moving window.
See also
altitude_mask_results_proc
function TO ADD LINK!!
Bibliography
- Duveiller, G., Hooker, J., & Cescatti, A. (2018). A dataset mapping the potential biophysical effects of vegetation cover change. Scientific Data, 5(1), Article 1. https://doi.org/10.1038/sdata.2018.14
YAXArraysToolbox.masking_proc
— MethodMasking processor
Arguments:
cube_in_to_mask
: YAXArray cube to be masked.cube_rsquare
: Nothing, or YAXArray cube with the $R^{2}$ variable. If set tonothing
no mask is appliedrsquare_thr
: Float64. $R^{2}$ threshold. All values lower thanrsquare_thr
are set toNaN
cube_co_occurrence
: Nothing, or YAXArray cube with the co-occurrence variable. If set tonothing
no mask is applied.co_occurence_thr
: Float64. Co-occurence threshold. All values lower thanco_occurence_thr
are set toNaN
cube_delta
: Nothing, or YAXArray cube with delta variable. If set tonothing
no mask is applied.minmax_delta
: Tuple. Minimum and maximum thresholds of delta variable. Values lower and higher than the thresholds are set toNaN
. It is also possible to set any of the thresholds asnothing
e.g.(-1, nothing)
or(nothing, 1)
in these cases only one threshold is applied.time_dim
: Nothing, or String. Name of the time dimension. This dimensions needs to be present in all the cubes. If set tonothing
no time dimension considered (It can result in slower computation time!). By defaulttime_dim = time
showprog
: Boolean. Show progress bar. By defaultshowprog = true
Output:
- YAXArray cube masked.
YAXArraysToolbox.masking_space
— MethodMasking using spatial dimension
The masked vales are set as NaN
!!.
Arguments:
cube_in
YAXArray Cube to be masked.
-mask
YAXArray Cube without time dimension and with a single variable to be used as mask. All values equal to NaN or missing will be masked in cubein. The mask will be applied to all the variables and time steps presented in ```cubein```.
lat_axis
: String. Name of the latitude axis.lon_axis
: String. Name of the longitude axis.val_mask
: NaN or missing. Value present inmask
to be used as reference to maskcube_in
. Must be NaN or missing.showprog
: Boolean. Progress Bar.max_cache
: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
using YAXArrays, Zarr
axlist = [
RangeAxis("time", range(1, 20, length = 20)),
RangeAxis("x", range(1, 10, length = 10)),
RangeAxis("y", range(1, 5, length = 15)),
CategoricalAxis("Variable", ["var1", "var2"]),
]
data = rand(20, 10, 15, 2)
ds = YAXArray(axlist, data, props)
axlist = [
RangeAxis("x", range(1, 10, length = 10)),
RangeAxis("y", range(1, 5, length = 15)),
CategoricalAxis("Variable", ["var1"]),
]
data = rand(10, 15, 1)
ds_mask = YAXArray(axlist, data)
masking_space(ds, ds_mask; lat_axis = "x", lon_axis = "y")
YAXArraysToolbox.masking_time
— MethodMasking using time dimension.
The function implements two methods:
- Masking based on a threshold value for one of the variables presented in the cube. e.g., masking the values of all the variables presented in the cube where radiation is lower than X.
- Masking based on the quantile threshold, where the quantile is estimated using the time series for each one of the variables presented in the cube.
The masked vales are set as NaN
.
Arguments:
cube_in
YAXArray Cube.time_axis
: String. Name of the time axis.var_axis
: String. Name of the axis containing the variables.var_mask
: String or nothing. Name of the variable to be used to mask the other variables. If Stringval
must be an Int64 or Float64 number. If nothing,val
must be nothing andp
must be a Float64 in the interval [0,1].val
: Float64 or nothing. The value of the threshold invar_mask
to be used to mask all the variables in the cube. Ifvar_mask = nothing
then,val=nothing
p
: Float64 or nothing. Quantile value used as a threshold to mask the variables.comp
: String. Standard comparison operation between the threshold value and each one of the elements.comp
Must be one of the following: "==", "!=" "<", "<=", ">", ">=".showprog
: Boolean. Progress Bar.max_cache
: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
using YAXArrays, Statistics, Zarr, NetCDF, YAXArraysToolbox
esds = open_dataset(
"https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-184x90x90-2.1.1.zarr",
)
esdc = Cube(esds)
esdc_small = esdc[
lon = (-86, -35),
lat = (-56, 14),
time = (Date(2010), Date(2014)),
Variable = ["leaf_area_index", "sensible_heat", "potential_evaporation"],
]
test = masking_time(
esdc_small;
time_axis = "time",
var_axis = "Variable",
var_mask = "leaf_area_index",
val = 0.2,
comp = "<",
showprog = true,
max_cache = "1GB",
)
plot_time(esdc_small; time_axis="time", var_axis="Variable", var = "leaf_area_index", lat_axis = "lat", lon_axis="lon", fun = "min")
plot_time(test; time_axis="time", var_axis="Variable", var = "leaf_area_index", lat_axis = "lat", lon_axis="lon", fun = "min")
YAXArraysToolbox.plot_space
— MethodPlot Space/Maps
Arguments
cube_in
: YAXArray Cube.time_axis
: String. Name of the time axis.var_axis
: String. Name of the axis containing the variables.var
: String or nothing. Name of the variable to be plotted. If nothing all the variables presented in the cube are plotted.lat_axis
: String. Name of the latitude axis.lon_axis
: String. Name of the longitute axis.fun
: String. Name of the function used to collapse the spatial dimensions. It must be "median", "mean", "std", "var", "sum", "quant", "min", or "max".p
: Float64. in the interval [0,1]. Iffun=quant
p is the value of the quantile.colormap
: Color Map. By default:colormap = Reverse(:batlow)
resolution
: Plot resolution. By defaultresolution = (800, 300)
.ncol
: Number of plots by column. By defaultncol = 1
.nrow
: Number of plots by row. By defaultncol = 1
.showprog
: Boolean. Progress Bar.max_cache
: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
cube_in = open_dataset(
"https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-1x720x1440-2.1.1.zarr",
)
cube_in = Cube(cube_in)
cube_in = cube_in[
lon = (-9.0, 0.0),
lat = (35, 40),
time = (Date(2010), Date(2014)),
Variable = ["leaf_area_index", "sensible_heat"],
]
plot_space(cube_in; time_axis = "time", resolution = (900, 600), var_axis = "Variable", var = "leaf_area_index", fun = "median")
metric = ["median", "mean", "std", "var", "sum", "quant", "min", "max"]
for i in eachindex(metric)
println(metric[i])
plot_space(
cube_in;
time_axis = "time",
var_axis = "Variable",
lon_axis = "lon",
lat_axis = "lat",
var = "sensible_heat",
fun = metric[i],
p = 0.2,
showprog = true,
max_cache = "100MB",
)
end
plot_space(
cube_in;
time_axis = "time",
var_axis = "Variable",
lon_axis = "lon",
lat_axis = "lat",
var = nothing,
fun = "median",
resolution = (1200, 600),
p = 0.2,
showprog = true,
max_cache = "100MB",
ncol = 2,
)
YAXArraysToolbox.plot_time
— MethodPlot time
The function allow to plot the time series of a given variables in a cube or all the variables present in a cube. As is expected that cubes contain spatial dimensions the spatial dimensions are collapsed using a function e.g., estimating the mean of the variable using the pixels of a certain area for each time step.
Arguments:
cube_in
YAXArray Cube.time_axis
: String. Name of the time axis.var_axis
: String. Name of the axis containing the variables.var
: String or nothing. Name of the variable to be plotted. If nothing all the variables presented in the cube are plotted.lat_axis
: String. Name of the latitude axis.lon_axis
: String. Name of the longitute axis.fun
: String. Name of the function used to collapse the spatial dimensions. It must be "median", "mean", "std", "var", "sum", "quant", "min", or "max".p
: Float64. in the interval [0,1]. Iffun=quant
p is the value of the quantile.resolution
: Tuple. Plot resolution. By defaultresolution = (600, 400)
.ncol
: Number of plots by column. By defaultncol = 1
.nrow
: Number of plots by row. By defaultncol = 1
.showprog
: Boolean. Progress Bar.max_cache
: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
cube_in = open_dataset(
"https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-1x720x1440-2.1.1.zarr",
)
cube_in = Cube(cube_in)
cube_in.Variable
cube_in = cube_in[
lon = (-9.0, 0.0),
lat = (35, 40),
time = (Date(2010), Date(2014)),
Variable = ["leaf_area_index", "sensible_heat"],
]
plot_time(
cube_in;
time_axis = "time",
var_axis = "Variable",
lon_axis = "lon",
lat_axis = "lat",
var = nothing,
fun = "median",
resolution = (900, 600),
p = 0.2,
showprog = true,
max_cache = "100MB",
ncol = 2
)
plot_time(
cube_in;
time_axis = "time",
var_axis = "Variable",
lon_axis = "lon",
lat_axis = "lat",
var = "sensible_heat",
fun = "median",
p = 0.2,
showprog = true,
max_cache = "100MB",
)
YAXArraysToolbox.s4time
— Methodspace4time(climatecube, pftscube, pft_list::Vector{String}, winsize = 5, minpxl = 100, minDiffPxlspercentage = 40)
Compute the space for time analysis for a given climate variable. ...
Arguments
climate_cube: YAXARRAY cube with dimenssions: lon, lat, time.
pfts_cube: YAXARRAY cube with dimenssions: pfts, lat,lon, time.
...
Output
Three output cubes are generated.
out1: Summary statistics. YAXARRAY cube where summary_stat axis contains:
* "rsquare": XXXX
* "cumulative_variance": XXXX
* "predicted": Prediction of Z for the central pixel with its real PFT combination.
out2:
#Examples
YAXArraysToolbox.space4time_proc
— MethodSpace for time processor
Arguments:
cube_con
: YAXARRAY with the continous variable to be analyized.
cube_classes
: YAXARRAY with the discrete classes to be used in the space4time.
time_axis_name
: String or nothing. Name of the time axis on the input cubes. By defaulttime_axis_name = "time"
. iftime_axis_name = nothing
, not time dimension considered.
lon_axis_name
: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lon"
lat_axis_name
: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lat"
classes_var_name
: String. Name of the Variable containing the discrete classes. By defaultclasses_var_name = "classes"
.
winsize
: Edge size of the moving window on pixels. By default winsize = 5. E.g.winsize = 5
will produce a moving window with 5^2 pixels.
minpxl
: Minimum number of pixels in the moving window. By default minpxl = 25. Change accordindly to yourwinsize
parameter.
minDiffPxlspercentage
: Percentage of minimum number pixels in the moving window that must have different compositions. Must be any value in the interval 30-100. By default minDiffPxlspercentage = 40
classes_vec
: A string vector with the names of the classes oncube_classes
to be used. e.g. from MPI-BGC internal structureclasses_vec = ["Evergreen_Needleleaf_Forests", "Evergreen_Broadleaf_Forests", "Deciduous_Needleleaf_Forests", "Deciduous_Broadleaf_Forests", "Mixed_Forests", "Closed_Shrublands", "Open_Shrublands", "Woody_Savannas", "Savannas", "Grasslands", "Permanent_Wetlands", "Croplands", "Urban_and_Built-up_Lands", "Cropland/Natural_Vegetation_Mosaics", "Permanent_Snow_and_Ice", "Barren", "Water_Bodies"]
max_value
: Indicates if the scale of the presence of the discrete classes if from 0 to 1 or 0 to 100 ifmax_value = 100
then the data is re-scaled from 0 to 1. By defaultmax_value = 1
showprog
: Show progress bar. By defaultshowprog = true
max_cache
: Size of the cache to allocate temporarily sections of the cubes. By defaultmax_cache = 1e8
Output:
The space4time_proc
produces a YAXARRAY.Dataset with three cubes:
- SummaryStats cube has one axis
summary_stat
, and three variables:rsquared
:cumulative_variance
:predicted
:
metrics_for_classes
cube has one axisValues of Z for pure classes
, and two variables:estimated
:estimated_error
:
- metricsfortransitions has two axis
transitions
(all the transitions by pairs between the different classes), andDifferences
with three variables:delta
:delta_error
:coocurence
:
YAXArraysToolbox.spacetime_folds
— MethodCreate Space-time Folds
Create spatial, temporal or spatio-temporal Folds for cross validation based on pre-defined groups.
Arguments:
x
DataFrame containing spatio-temporal data.spacevar
: String. which column of x identifies the spatial units (e.g. ID of weather stations).timevar
: String. which column of x identifies the temporal units (e.g. the day of the year).k
: Int64. Number of folds. If spacevar or timevar is nothing and a leave one location out or leave one time step out cv should be performed, set k to the number of unique spatial or temporal units.class
: String. which column of x identifies a class unit (e.g. land cover) NOT IMPLEMENTED YET!!.seed
: Int64 or Float64, See ?Random.seed!().
Return
cv_indices_train, cv_indices_test = spacetime_folds(x;spacevar="var1", timevar="var2", k=10, class=nothing, seed=23)
References
Meyer, H., Reudenbach, C., Hengl, T., Katurji, M., Nauß, T. (2018): Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environmental Modelling & Software 101: 1-9.