YAXArraysToolbox
Documentation for YAXArraysToolbox.
YAXArraysToolbox.aggregate_timeYAXArraysToolbox.altitude_mask_results_procYAXArraysToolbox.altitude_masking_procYAXArraysToolbox.masking_procYAXArraysToolbox.masking_spaceYAXArraysToolbox.masking_timeYAXArraysToolbox.plot_spaceYAXArraysToolbox.plot_timeYAXArraysToolbox.s4timeYAXArraysToolbox.space4time_procYAXArraysToolbox.spacetime_folds
YAXArraysToolbox.aggregate_time — MethodAggregate by time
Arguments:
cube_inYAXArray Cube.time_axis: String. Name of the time axis.new_resolution: String. New temporal resolution can be"day","month","year".new_time_step: Int64. Time step to be computed in the new time series. e.g.new_resolution="day", new_time_step=8will compute the function each 8 days. The new time dimension will only contain the days corresponding to the 8th day.fun: String. Function to be applied to aggregate the time. It can be "median", "mean", "std", "var", "sum", "quant", "min", "max".p: Float64 in the interval [0,1]. Iffun=quantp is the value of the quantile.skipMissing: Boolean. Skip missing values when aggregating the data. If all values are missing, NaN is returned.skipnan: Boolean. Skip NaN values when aggregating the data. If all values are NaN, NaN is returned.showprog: Boolean. Progress Bar.max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
esds = open_dataset("https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-184x90x90-2.1.1.zarr")
esdc = Cube(esds)
# Estimating the monthly LAI
lai_month = aggregate_time(esdc[Variable = "leaf_area_index"]; time_axis = "time", new_resolution = "month", new_time_step=1, fun="mean", p=nothing, skipMissing=true, skipnan=true, showprog=true, max_cache="1GB")
YAXArraysToolbox.altitude_mask_results_proc — MethodTopographical variability processor
Arguments:
cube_in_altitude: Altitude YAXARRAY with two variables mean, and sd.lon_axis_name: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lon"lat_axis_name: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lat"variable_name: String. Name of the Variable containing the variables "mean", and "sd".winsize: Edge size of the moving window on pixels. By default winsize = 5. E.g.winsize = 5will produce a moving window with 5^2 pixels.showprog: Show progress bar. By defaultshowprog = true
Output:
The Topographical variability processor produces a YAXARRAY.cube with three Indicators:
v1: $v_{1}=\dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i}$
High values of $v_{1}$ indicate hilly terrain over the considered scale, which should be discarded from the analysis.
v2: $v_{2}=\vert \mu_{h} - \dfrac{1}{n} \sum_{i = 1}^{n}\mu_{h,i} \vert$
$v_{2}$ indicates how different the mean elevation within the central pixel is from the average elevation in the local window.
v3: $v_{3}= \vert \sigma_{h} - \dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i} \vert$
$v_{3}$: compares the central pixel's standard deviation of elevation with the standard deviation across the moving window.
See also:
altitude_masking_procTO ADD LINK!!
Bibliography:
- Duveiller, G., Hooker, J., & Cescatti, A. (2018). A dataset mapping the potential biophysical effects of vegetation cover change. Scientific Data, 5(1), Article 1. https://doi.org/10.1038/sdata.2018.14
YAXArraysToolbox.altitude_masking_proc — MethodTopographical masking processor
Arguments:
cube_in_to_mask: YAXArray Cube to be masked.cube_in_altitude: Altitude YAXARRAY with two variables mean, and sd.lon_axis_name: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lon"lat_axis_name: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lat"variable_name: String. Name of the Variable containing the variables "mean", and "sd".time_axis_name: String or NaN. It is strongly recommended to pass this parameter if the cube to be masked contains a time dimension, otherwisenothing.winsize: Edge size of the moving window on pixels. By defaultwinsize = 5. E.g.winsize = 5will produce a moving window with 5^2 pixels.v1_thr: Float. Threshold to mask values using $v_{1}$ indicator. All values higer or equal tov1_thrare set to NaN. By defaultv1_thr = 50v2_thr: Float. Threshold to mask values using $v_{2}$ indicator. All values higer or equal tov2_thrare set to NaN. By detaultv2_thr = 100v3_thr: Float. Threshold to mask values using $v_{3}$ indicator. All values higer or equal tov3are set to NaN. By defaultv3_thr = 100showprog: Boolean. Show progress bar. By defaultshowprog = true.
Output:
- YAXArray Datase with two variables:
cube masked: YAXArray Cube with same dimensions ascube_in_to_mask.masked_pixels: YAXArray Cube with same lat, lon, dimensions as cube_masked but with a single boolean variable indicating if the pixel was masked or not.
Topographical variability indicators
v1: $v_{1}=\dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i}$
High values of $v_{1}$ indicate hilly terrain over the considered scale, which should be discarded from the analysis.
v2: $v_{2}=\vert \mu_{h} - \dfrac{1}{n} \sum_{i = 1}^{n}\mu_{h,i} \vert$
$v_{2}$ indicates how different the mean elevation within the central pixel is from the average elevation in the local window.
v3: $v_{3}= \vert \sigma_{h} - \dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i} \vert$
$v_{3}$: compares the central pixel's standard deviation of elevation with the standard deviation across the moving window.
See also
altitude_mask_results_procfunction TO ADD LINK!!
Bibliography
- Duveiller, G., Hooker, J., & Cescatti, A. (2018). A dataset mapping the potential biophysical effects of vegetation cover change. Scientific Data, 5(1), Article 1. https://doi.org/10.1038/sdata.2018.14
YAXArraysToolbox.masking_proc — MethodMasking processor
Arguments:
cube_in_to_mask: YAXArray cube to be masked.cube_rsquare: Nothing, or YAXArray cube with the $R^{2}$ variable. If set tonothingno mask is appliedrsquare_thr: Float64. $R^{2}$ threshold. All values lower thanrsquare_thrare set toNaNcube_co_occurrence: Nothing, or YAXArray cube with the co-occurrence variable. If set tonothingno mask is applied.co_occurence_thr: Float64. Co-occurence threshold. All values lower thanco_occurence_thrare set toNaNcube_delta: Nothing, or YAXArray cube with delta variable. If set tonothingno mask is applied.minmax_delta: Tuple. Minimum and maximum thresholds of delta variable. Values lower and higher than the thresholds are set toNaN. It is also possible to set any of the thresholds asnothinge.g.(-1, nothing)or(nothing, 1)in these cases only one threshold is applied.time_dim: Nothing, or String. Name of the time dimension. This dimensions needs to be present in all the cubes. If set tonothingno time dimension considered (It can result in slower computation time!). By defaulttime_dim = timeshowprog: Boolean. Show progress bar. By defaultshowprog = true
Output:
- YAXArray cube masked.
YAXArraysToolbox.masking_space — MethodMasking using spatial dimension
The masked vales are set as NaN!!.
Arguments:
cube_inYAXArray Cube to be masked.
-mask YAXArray Cube without time dimension and with a single variable to be used as mask. All values equal to NaN or missing will be masked in cubein. The mask will be applied to all the variables and time steps presented in ```cubein```.
lat_axis: String. Name of the latitude axis.lon_axis: String. Name of the longitude axis.val_mask: NaN or missing. Value present inmaskto be used as reference to maskcube_in. Must be NaN or missing.showprog: Boolean. Progress Bar.max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
using YAXArrays, Zarr
axlist = [
RangeAxis("time", range(1, 20, length = 20)),
RangeAxis("x", range(1, 10, length = 10)),
RangeAxis("y", range(1, 5, length = 15)),
CategoricalAxis("Variable", ["var1", "var2"]),
]
data = rand(20, 10, 15, 2)
ds = YAXArray(axlist, data, props)
axlist = [
RangeAxis("x", range(1, 10, length = 10)),
RangeAxis("y", range(1, 5, length = 15)),
CategoricalAxis("Variable", ["var1"]),
]
data = rand(10, 15, 1)
ds_mask = YAXArray(axlist, data)
masking_space(ds, ds_mask; lat_axis = "x", lon_axis = "y")YAXArraysToolbox.masking_time — MethodMasking using time dimension.
The function implements two methods:
- Masking based on a threshold value for one of the variables presented in the cube. e.g., masking the values of all the variables presented in the cube where radiation is lower than X.
- Masking based on the quantile threshold, where the quantile is estimated using the time series for each one of the variables presented in the cube.
The masked vales are set as NaN.
Arguments:
cube_inYAXArray Cube.time_axis: String. Name of the time axis.var_axis: String. Name of the axis containing the variables.var_mask: String or nothing. Name of the variable to be used to mask the other variables. If Stringvalmust be an Int64 or Float64 number. If nothing,valmust be nothing andpmust be a Float64 in the interval [0,1].val: Float64 or nothing. The value of the threshold invar_maskto be used to mask all the variables in the cube. Ifvar_mask = nothingthen,val=nothingp: Float64 or nothing. Quantile value used as a threshold to mask the variables.comp: String. Standard comparison operation between the threshold value and each one of the elements.compMust be one of the following: "==", "!=" "<", "<=", ">", ">=".showprog: Boolean. Progress Bar.max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
using YAXArrays, Statistics, Zarr, NetCDF, YAXArraysToolbox
esds = open_dataset(
"https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-184x90x90-2.1.1.zarr",
)
esdc = Cube(esds)
esdc_small = esdc[
lon = (-86, -35),
lat = (-56, 14),
time = (Date(2010), Date(2014)),
Variable = ["leaf_area_index", "sensible_heat", "potential_evaporation"],
]
test = masking_time(
esdc_small;
time_axis = "time",
var_axis = "Variable",
var_mask = "leaf_area_index",
val = 0.2,
comp = "<",
showprog = true,
max_cache = "1GB",
)
plot_time(esdc_small; time_axis="time", var_axis="Variable", var = "leaf_area_index", lat_axis = "lat", lon_axis="lon", fun = "min")
plot_time(test; time_axis="time", var_axis="Variable", var = "leaf_area_index", lat_axis = "lat", lon_axis="lon", fun = "min")YAXArraysToolbox.plot_space — MethodPlot Space/Maps
Arguments
cube_in: YAXArray Cube.time_axis: String. Name of the time axis.var_axis: String. Name of the axis containing the variables.var: String or nothing. Name of the variable to be plotted. If nothing all the variables presented in the cube are plotted.lat_axis: String. Name of the latitude axis.lon_axis: String. Name of the longitute axis.fun: String. Name of the function used to collapse the spatial dimensions. It must be "median", "mean", "std", "var", "sum", "quant", "min", or "max".p: Float64. in the interval [0,1]. Iffun=quantp is the value of the quantile.colormap: Color Map. By default:colormap = Reverse(:batlow)resolution: Plot resolution. By defaultresolution = (800, 300).ncol: Number of plots by column. By defaultncol = 1.nrow: Number of plots by row. By defaultncol = 1.showprog: Boolean. Progress Bar.max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
cube_in = open_dataset(
"https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-1x720x1440-2.1.1.zarr",
)
cube_in = Cube(cube_in)
cube_in = cube_in[
lon = (-9.0, 0.0),
lat = (35, 40),
time = (Date(2010), Date(2014)),
Variable = ["leaf_area_index", "sensible_heat"],
]
plot_space(cube_in; time_axis = "time", resolution = (900, 600), var_axis = "Variable", var = "leaf_area_index", fun = "median")
metric = ["median", "mean", "std", "var", "sum", "quant", "min", "max"]
for i in eachindex(metric)
println(metric[i])
plot_space(
cube_in;
time_axis = "time",
var_axis = "Variable",
lon_axis = "lon",
lat_axis = "lat",
var = "sensible_heat",
fun = metric[i],
p = 0.2,
showprog = true,
max_cache = "100MB",
)
end
plot_space(
cube_in;
time_axis = "time",
var_axis = "Variable",
lon_axis = "lon",
lat_axis = "lat",
var = nothing,
fun = "median",
resolution = (1200, 600),
p = 0.2,
showprog = true,
max_cache = "100MB",
ncol = 2,
)
YAXArraysToolbox.plot_time — MethodPlot time
The function allow to plot the time series of a given variables in a cube or all the variables present in a cube. As is expected that cubes contain spatial dimensions the spatial dimensions are collapsed using a function e.g., estimating the mean of the variable using the pixels of a certain area for each time step.
Arguments:
cube_inYAXArray Cube.time_axis: String. Name of the time axis.var_axis: String. Name of the axis containing the variables.var: String or nothing. Name of the variable to be plotted. If nothing all the variables presented in the cube are plotted.lat_axis: String. Name of the latitude axis.lon_axis: String. Name of the longitute axis.fun: String. Name of the function used to collapse the spatial dimensions. It must be "median", "mean", "std", "var", "sum", "quant", "min", or "max".p: Float64. in the interval [0,1]. Iffun=quantp is the value of the quantile.resolution: Tuple. Plot resolution. By defaultresolution = (600, 400).ncol: Number of plots by column. By defaultncol = 1.nrow: Number of plots by row. By defaultncol = 1.showprog: Boolean. Progress Bar.max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".
Examples
cube_in = open_dataset(
"https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-1x720x1440-2.1.1.zarr",
)
cube_in = Cube(cube_in)
cube_in.Variable
cube_in = cube_in[
lon = (-9.0, 0.0),
lat = (35, 40),
time = (Date(2010), Date(2014)),
Variable = ["leaf_area_index", "sensible_heat"],
]
plot_time(
cube_in;
time_axis = "time",
var_axis = "Variable",
lon_axis = "lon",
lat_axis = "lat",
var = nothing,
fun = "median",
resolution = (900, 600),
p = 0.2,
showprog = true,
max_cache = "100MB",
ncol = 2
)
plot_time(
cube_in;
time_axis = "time",
var_axis = "Variable",
lon_axis = "lon",
lat_axis = "lat",
var = "sensible_heat",
fun = "median",
p = 0.2,
showprog = true,
max_cache = "100MB",
)YAXArraysToolbox.s4time — Methodspace4time(climatecube, pftscube, pft_list::Vector{String}, winsize = 5, minpxl = 100, minDiffPxlspercentage = 40)
Compute the space for time analysis for a given climate variable. ...
Arguments
climate_cube: YAXARRAY cube with dimenssions: lon, lat, time.
pfts_cube: YAXARRAY cube with dimenssions: pfts, lat,lon, time.
...Output
Three output cubes are generated.
out1: Summary statistics. YAXARRAY cube where summary_stat axis contains:
* "rsquare": XXXX
* "cumulative_variance": XXXX
* "predicted": Prediction of Z for the central pixel with its real PFT combination.
out2:#Examples
YAXArraysToolbox.space4time_proc — MethodSpace for time processor
Arguments:
cube_con: YAXARRAY with the continous variable to be analyized.
cube_classes: YAXARRAY with the discrete classes to be used in the space4time.
time_axis_name: String or nothing. Name of the time axis on the input cubes. By defaulttime_axis_name = "time". iftime_axis_name = nothing, not time dimension considered.
lon_axis_name: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lon"
lat_axis_name: String. Name of the longitude axis on the input cubes. By defaultlon_axis_name = "lat"
classes_var_name: String. Name of the Variable containing the discrete classes. By defaultclasses_var_name = "classes".
winsize: Edge size of the moving window on pixels. By default winsize = 5. E.g.winsize = 5will produce a moving window with 5^2 pixels.
minpxl: Minimum number of pixels in the moving window. By default minpxl = 25. Change accordindly to yourwinsizeparameter.
minDiffPxlspercentage: Percentage of minimum number pixels in the moving window that must have different compositions. Must be any value in the interval 30-100. By default minDiffPxlspercentage = 40
classes_vec: A string vector with the names of the classes oncube_classesto be used. e.g. from MPI-BGC internal structureclasses_vec = ["Evergreen_Needleleaf_Forests", "Evergreen_Broadleaf_Forests", "Deciduous_Needleleaf_Forests", "Deciduous_Broadleaf_Forests", "Mixed_Forests", "Closed_Shrublands", "Open_Shrublands", "Woody_Savannas", "Savannas", "Grasslands", "Permanent_Wetlands", "Croplands", "Urban_and_Built-up_Lands", "Cropland/Natural_Vegetation_Mosaics", "Permanent_Snow_and_Ice", "Barren", "Water_Bodies"]max_value: Indicates if the scale of the presence of the discrete classes if from 0 to 1 or 0 to 100 ifmax_value = 100then the data is re-scaled from 0 to 1. By defaultmax_value = 1showprog: Show progress bar. By defaultshowprog = truemax_cache: Size of the cache to allocate temporarily sections of the cubes. By defaultmax_cache = 1e8
Output:
The space4time_proc produces a YAXARRAY.Dataset with three cubes:
- SummaryStats cube has one axis
summary_stat, and three variables:rsquared:cumulative_variance:predicted:
metrics_for_classescube has one axisValues of Z for pure classes, and two variables:estimated:estimated_error:
- metricsfortransitions has two axis
transitions(all the transitions by pairs between the different classes), andDifferenceswith three variables:delta:delta_error:coocurence:
YAXArraysToolbox.spacetime_folds — MethodCreate Space-time Folds
Create spatial, temporal or spatio-temporal Folds for cross validation based on pre-defined groups.
Arguments:
xDataFrame containing spatio-temporal data.spacevar: String. which column of x identifies the spatial units (e.g. ID of weather stations).timevar: String. which column of x identifies the temporal units (e.g. the day of the year).k: Int64. Number of folds. If spacevar or timevar is nothing and a leave one location out or leave one time step out cv should be performed, set k to the number of unique spatial or temporal units.class: String. which column of x identifies a class unit (e.g. land cover) NOT IMPLEMENTED YET!!.seed: Int64 or Float64, See ?Random.seed!().
Return
cv_indices_train, cv_indices_test = spacetime_folds(x;spacevar="var1", timevar="var2", k=10, class=nothing, seed=23)
References
Meyer, H., Reudenbach, C., Hengl, T., Katurji, M., Nauß, T. (2018): Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environmental Modelling & Software 101: 1-9.