YAXArraysToolbox

Documentation for YAXArraysToolbox.

YAXArraysToolbox.aggregate_timeMethod

Aggregate by time

Arguments:

  • cube_in YAXArray Cube.
  • time_axis: String. Name of the time axis.
  • new_resolution: String. New temporal resolution can be "day", "month", "year".
  • new_time_step: Int64. Time step to be computed in the new time series. e.g. new_resolution="day", new_time_step=8 will compute the function each 8 days. The new time dimension will only contain the days corresponding to the 8th day.
  • fun: String. Function to be applied to aggregate the time. It can be "median", "mean", "std", "var", "sum", "quant", "min", "max".
  • p: Float64 in the interval [0,1]. If fun=quant p is the value of the quantile.
  • skipMissing: Boolean. Skip missing values when aggregating the data. If all values are missing, NaN is returned.
  • skipnan: Boolean. Skip NaN values when aggregating the data. If all values are NaN, NaN is returned.
  • showprog: Boolean. Progress Bar.
  • max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".

Examples


esds = open_dataset("https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-184x90x90-2.1.1.zarr")
esdc = Cube(esds)

# Estimating the monthly LAI

lai_month = aggregate_time(esdc[Variable = "leaf_area_index"]; time_axis = "time", new_resolution = "month", new_time_step=1, fun="mean", p=nothing, skipMissing=true, skipnan=true, showprog=true, max_cache="1GB")
source
YAXArraysToolbox.altitude_mask_results_procMethod

Topographical variability processor

Arguments:

  • cube_in_altitude : Altitude YAXARRAY with two variables mean, and sd.

  • lon_axis_name : String. Name of the longitude axis on the input cubes. By default lon_axis_name = "lon"

  • lat_axis_name : String. Name of the longitude axis on the input cubes. By default lon_axis_name = "lat"

  • variable_name : String. Name of the Variable containing the variables "mean", and "sd".

  • winsize: Edge size of the moving window on pixels. By default winsize = 5. E.g. winsize = 5 will produce a moving window with 5^2 pixels.

  • showprog: Show progress bar. By default showprog = true

Output:

The Topographical variability processor produces a YAXARRAY.cube with three Indicators:

  • v1: $v_{1}=\dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i}$

High values of $v_{1}$ indicate hilly terrain over the considered scale, which should be discarded from the analysis.

  • v2: $v_{2}=\vert \mu_{h} - \dfrac{1}{n} \sum_{i = 1}^{n}\mu_{h,i} \vert$

$v_{2}$ indicates how different the mean elevation within the central pixel is from the average elevation in the local window.

  • v3: $v_{3}= \vert \sigma_{h} - \dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i} \vert$

$v_{3}$: compares the central pixel's standard deviation of elevation with the standard deviation across the moving window.

See also:

  • altitude_masking_proc TO ADD LINK!!

Bibliography:

  • Duveiller, G., Hooker, J., & Cescatti, A. (2018). A dataset mapping the potential biophysical effects of vegetation cover change. Scientific Data, 5(1), Article 1. https://doi.org/10.1038/sdata.2018.14
source
YAXArraysToolbox.altitude_masking_procMethod

Topographical masking processor

Arguments:

  • cube_in_to_mask: YAXArray Cube to be masked.

  • cube_in_altitude : Altitude YAXARRAY with two variables mean, and sd.

  • lon_axis_name : String. Name of the longitude axis on the input cubes. By default lon_axis_name = "lon"

  • lat_axis_name : String. Name of the longitude axis on the input cubes. By default lon_axis_name = "lat"

  • variable_name : String. Name of the Variable containing the variables "mean", and "sd".

  • time_axis_name : String or NaN. It is strongly recommended to pass this parameter if the cube to be masked contains a time dimension, otherwise nothing.

  • winsize: Edge size of the moving window on pixels. By default winsize = 5. E.g. winsize = 5 will produce a moving window with 5^2 pixels.

  • v1_thr : Float. Threshold to mask values using $v_{1}$ indicator. All values higer or equal to v1_thr are set to NaN. By default v1_thr = 50

  • v2_thr : Float. Threshold to mask values using $v_{2}$ indicator. All values higer or equal to v2_thr are set to NaN. By detault v2_thr = 100

  • v3_thr : Float. Threshold to mask values using $v_{3}$ indicator. All values higer or equal to v3 are set to NaN. By default v3_thr = 100

  • showprog: Boolean. Show progress bar. By default showprog = true.

Output:

  • YAXArray Datase with two variables:
    • cube masked: YAXArray Cube with same dimensions as cube_in_to_mask.
    • masked_pixels: YAXArray Cube with same lat, lon, dimensions as cube_masked but with a single boolean variable indicating if the pixel was masked or not.

Topographical variability indicators

  • v1: $v_{1}=\dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i}$

High values of $v_{1}$ indicate hilly terrain over the considered scale, which should be discarded from the analysis.

  • v2: $v_{2}=\vert \mu_{h} - \dfrac{1}{n} \sum_{i = 1}^{n}\mu_{h,i} \vert$

$v_{2}$ indicates how different the mean elevation within the central pixel is from the average elevation in the local window.

  • v3: $v_{3}= \vert \sigma_{h} - \dfrac{1}{n}\sum_{i = 1}^{n}\sigma_{h,i} \vert$

$v_{3}$: compares the central pixel's standard deviation of elevation with the standard deviation across the moving window.

See also

  • altitude_mask_results_proc function TO ADD LINK!!

Bibliography

  • Duveiller, G., Hooker, J., & Cescatti, A. (2018). A dataset mapping the potential biophysical effects of vegetation cover change. Scientific Data, 5(1), Article 1. https://doi.org/10.1038/sdata.2018.14
source
YAXArraysToolbox.masking_procMethod

Masking processor

Arguments:

  • cube_in_to_mask: YAXArray cube to be masked.

  • cube_rsquare: Nothing, or YAXArray cube with the $R^{2}$ variable. If set to nothing no mask is applied

  • rsquare_thr: Float64. $R^{2}$ threshold. All values lower than rsquare_thr are set to NaN

  • cube_co_occurrence: Nothing, or YAXArray cube with the co-occurrence variable. If set to nothing no mask is applied.

  • co_occurence_thr: Float64. Co-occurence threshold. All values lower than co_occurence_thr are set to NaN

  • cube_delta: Nothing, or YAXArray cube with delta variable. If set to nothing no mask is applied.

  • minmax_delta: Tuple. Minimum and maximum thresholds of delta variable. Values lower and higher than the thresholds are set to NaN. It is also possible to set any of the thresholds as nothing e.g. (-1, nothing) or (nothing, 1) in these cases only one threshold is applied.

  • time_dim: Nothing, or String. Name of the time dimension. This dimensions needs to be present in all the cubes. If set to nothing no time dimension considered (It can result in slower computation time!). By default time_dim = time

  • showprog: Boolean. Show progress bar. By default showprog = true

Output:

  • YAXArray cube masked.
source
YAXArraysToolbox.masking_spaceMethod

Masking using spatial dimension

The masked vales are set as NaN!!.

Arguments:

  • cube_in YAXArray Cube to be masked.

-mask YAXArray Cube without time dimension and with a single variable to be used as mask. All values equal to NaN or missing will be masked in cubein. The mask will be applied to all the variables and time steps presented in ```cubein```.

  • lat_axis: String. Name of the latitude axis.
  • lon_axis: String. Name of the longitude axis.
  • val_mask: NaN or missing. Value present in mask to be used as reference to mask cube_in. Must be NaN or missing.
  • showprog: Boolean. Progress Bar.
  • max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".

Examples

using YAXArrays, Zarr


axlist = [
RangeAxis("time", range(1, 20, length = 20)),
RangeAxis("x", range(1, 10, length = 10)),
RangeAxis("y", range(1, 5, length = 15)),
CategoricalAxis("Variable", ["var1", "var2"]),
]


data = rand(20, 10, 15, 2)


ds = YAXArray(axlist, data, props)

axlist = [
RangeAxis("x", range(1, 10, length = 10)),
RangeAxis("y", range(1, 5, length = 15)),
CategoricalAxis("Variable", ["var1"]),
]


data = rand(10, 15, 1)

ds_mask = YAXArray(axlist, data)

masking_space(ds, ds_mask; lat_axis = "x", lon_axis = "y")
source
YAXArraysToolbox.masking_timeMethod

Masking using time dimension.

The function implements two methods:

  1. Masking based on a threshold value for one of the variables presented in the cube. e.g., masking the values of all the variables presented in the cube where radiation is lower than X.
  2. Masking based on the quantile threshold, where the quantile is estimated using the time series for each one of the variables presented in the cube.

The masked vales are set as NaN.

Arguments:

  • cube_in YAXArray Cube.
  • time_axis: String. Name of the time axis.
  • var_axis: String. Name of the axis containing the variables.
  • var_mask: String or nothing. Name of the variable to be used to mask the other variables. If String val must be an Int64 or Float64 number. If nothing, val must be nothing and p must be a Float64 in the interval [0,1].
  • val: Float64 or nothing. The value of the threshold in var_mask to be used to mask all the variables in the cube. If var_mask = nothing then, val=nothing
  • p: Float64 or nothing. Quantile value used as a threshold to mask the variables.
  • comp: String. Standard comparison operation between the threshold value and each one of the elements. comp Must be one of the following: "==", "!=" "<", "<=", ">", ">=".
  • showprog: Boolean. Progress Bar.
  • max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".

Examples

using YAXArrays, Statistics, Zarr, NetCDF, YAXArraysToolbox

esds = open_dataset(
    "https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-184x90x90-2.1.1.zarr",
)
esdc = Cube(esds)

esdc_small = esdc[
    lon = (-86, -35),
    lat = (-56, 14),
    time = (Date(2010), Date(2014)),
    Variable = ["leaf_area_index", "sensible_heat", "potential_evaporation"],
]

test = masking_time(
    esdc_small;
    time_axis = "time",
    var_axis = "Variable",
    var_mask = "leaf_area_index",
    val = 0.2,
    comp = "<",
    showprog = true,
    max_cache = "1GB",
)

plot_time(esdc_small; time_axis="time", var_axis="Variable", var = "leaf_area_index", lat_axis = "lat", lon_axis="lon", fun = "min")

plot_time(test; time_axis="time", var_axis="Variable", var = "leaf_area_index", lat_axis = "lat", lon_axis="lon", fun = "min")
source
YAXArraysToolbox.plot_spaceMethod

Plot Space/Maps

Arguments

  • cube_in: YAXArray Cube.
  • time_axis: String. Name of the time axis.
  • var_axis: String. Name of the axis containing the variables.
  • var: String or nothing. Name of the variable to be plotted. If nothing all the variables presented in the cube are plotted.
  • lat_axis: String. Name of the latitude axis.
  • lon_axis: String. Name of the longitute axis.
  • fun: String. Name of the function used to collapse the spatial dimensions. It must be "median", "mean", "std", "var", "sum", "quant", "min", or "max".
  • p: Float64. in the interval [0,1]. If fun=quant p is the value of the quantile.
  • colormap: Color Map. By default: colormap = Reverse(:batlow)
  • resolution: Plot resolution. By default resolution = (800, 300).
  • ncol: Number of plots by column. By default ncol = 1.
  • nrow: Number of plots by row. By default ncol = 1.
  • showprog: Boolean. Progress Bar.
  • max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".

Examples


cube_in = open_dataset(
    "https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-1x720x1440-2.1.1.zarr",
)

cube_in = Cube(cube_in)


cube_in = cube_in[
    lon = (-9.0, 0.0),
    lat = (35, 40),
    time = (Date(2010), Date(2014)),
    Variable = ["leaf_area_index", "sensible_heat"],
]

plot_space(cube_in; time_axis = "time", resolution = (900, 600), var_axis = "Variable", var =  "leaf_area_index", fun = "median")


metric = ["median", "mean", "std", "var", "sum", "quant", "min", "max"]


for i in eachindex(metric)
    println(metric[i])
    plot_space(
        cube_in;
        time_axis = "time",
        var_axis = "Variable",
        lon_axis = "lon",
        lat_axis = "lat",
        var = "sensible_heat",
        fun = metric[i],
        p = 0.2,
        showprog = true,
        max_cache = "100MB",
    )
end



plot_space(
    cube_in;
    time_axis = "time",
    var_axis = "Variable",
    lon_axis = "lon",
    lat_axis = "lat",
    var = nothing,
    fun = "median",
    resolution = (1200, 600),
    p = 0.2,
    showprog = true,
    max_cache = "100MB",
    ncol = 2,
)
source
YAXArraysToolbox.plot_timeMethod

Plot time

The function allow to plot the time series of a given variables in a cube or all the variables present in a cube. As is expected that cubes contain spatial dimensions the spatial dimensions are collapsed using a function e.g., estimating the mean of the variable using the pixels of a certain area for each time step.

Arguments:

  • cube_in YAXArray Cube.
  • time_axis: String. Name of the time axis.
  • var_axis: String. Name of the axis containing the variables.
  • var: String or nothing. Name of the variable to be plotted. If nothing all the variables presented in the cube are plotted.
  • lat_axis: String. Name of the latitude axis.
  • lon_axis: String. Name of the longitute axis.
  • fun: String. Name of the function used to collapse the spatial dimensions. It must be "median", "mean", "std", "var", "sum", "quant", "min", or "max".
  • p: Float64. in the interval [0,1]. If fun=quant p is the value of the quantile.
  • resolution: Tuple. Plot resolution. By default resolution = (600, 400).
  • ncol: Number of plots by column. By default ncol = 1.
  • nrow: Number of plots by row. By default ncol = 1.
  • showprog: Boolean. Progress Bar.
  • max_cache: String. Maximum cache to read the data. It must be in MB e.g. "100MB" or in GB "10GB".

Examples

cube_in = open_dataset(
    "https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-1x720x1440-2.1.1.zarr",
)

cube_in = Cube(cube_in)
cube_in.Variable
cube_in = cube_in[
    lon = (-9.0, 0.0),
    lat = (35, 40),
    time = (Date(2010), Date(2014)),
    Variable = ["leaf_area_index", "sensible_heat"],
]

plot_time(
    cube_in;
    time_axis = "time",
    var_axis = "Variable",
    lon_axis = "lon",
    lat_axis = "lat",
    var = nothing,
    fun = "median",
    resolution = (900, 600),
    p = 0.2,
    showprog = true,
    max_cache = "100MB",
    ncol = 2
)

plot_time(
    cube_in;
    time_axis = "time",
    var_axis = "Variable",
    lon_axis = "lon",
    lat_axis = "lat",
    var = "sensible_heat",
    fun = "median",
    p = 0.2,
    showprog = true,
    max_cache = "100MB",
)
source
YAXArraysToolbox.s4timeMethod

space4time(climatecube, pftscube, pft_list::Vector{String}, winsize = 5, minpxl = 100, minDiffPxlspercentage = 40)

Compute the space for time analysis for a given climate variable. ...

Arguments

climate_cube: YAXARRAY cube with dimenssions: lon, lat, time.
pfts_cube: YAXARRAY cube with dimenssions: pfts, lat,lon, time.
...

Output

Three output cubes are generated.
out1: Summary statistics. YAXARRAY cube where summary_stat axis contains: 
* "rsquare": XXXX 
* "cumulative_variance": XXXX
* "predicted": Prediction of Z for the central pixel with its real PFT combination.
out2:

#Examples

source
YAXArraysToolbox.space4time_procMethod

Space for time processor

Arguments:

  • cube_con : YAXARRAY with the continous variable to be analyized.
  • cube_classes: YAXARRAY with the discrete classes to be used in the space4time.
  • time_axis_name : String or nothing. Name of the time axis on the input cubes. By default time_axis_name = "time". if time_axis_name = nothing, not time dimension considered.
  • lon_axis_name : String. Name of the longitude axis on the input cubes. By default lon_axis_name = "lon"
  • lat_axis_name : String. Name of the longitude axis on the input cubes. By default lon_axis_name = "lat"
  • classes_var_name : String. Name of the Variable containing the discrete classes. By default classes_var_name = "classes".
  • winsize: Edge size of the moving window on pixels. By default winsize = 5. E.g. winsize = 5 will produce a moving window with 5^2 pixels.
  • minpxl : Minimum number of pixels in the moving window. By default minpxl = 25. Change accordindly to your winsize parameter.
  • minDiffPxlspercentage: Percentage of minimum number pixels in the moving window that must have different compositions. Must be any value in the interval 30-100. By default minDiffPxlspercentage = 40
  • classes_vec: A string vector with the names of the classes on cube_classes to be used. e.g. from MPI-BGC internal structure classes_vec = ["Evergreen_Needleleaf_Forests", "Evergreen_Broadleaf_Forests", "Deciduous_Needleleaf_Forests", "Deciduous_Broadleaf_Forests", "Mixed_Forests", "Closed_Shrublands", "Open_Shrublands", "Woody_Savannas", "Savannas", "Grasslands", "Permanent_Wetlands", "Croplands", "Urban_and_Built-up_Lands", "Cropland/Natural_Vegetation_Mosaics", "Permanent_Snow_and_Ice", "Barren", "Water_Bodies"]

  • max_value: Indicates if the scale of the presence of the discrete classes if from 0 to 1 or 0 to 100 if max_value = 100 then the data is re-scaled from 0 to 1. By default max_value = 1

  • showprog: Show progress bar. By default showprog = true

  • max_cache: Size of the cache to allocate temporarily sections of the cubes. By default max_cache = 1e8

Output:

The space4time_proc produces a YAXARRAY.Dataset with three cubes:

  • SummaryStats cube has one axis summary_stat, and three variables:
    • rsquared:
    • cumulative_variance:
    • predicted:
  • metrics_for_classes cube has one axis Values of Z for pure classes, and two variables:
    • estimated:
    • estimated_error:
  • metricsfortransitions has two axis transitions (all the transitions by pairs between the different classes), and Differences with three variables:
    • delta:
    • delta_error:
    • coocurence:
source
YAXArraysToolbox.spacetime_foldsMethod

Create Space-time Folds

Create spatial, temporal or spatio-temporal Folds for cross validation based on pre-defined groups.

Arguments:

  • x DataFrame containing spatio-temporal data.
  • spacevar: String. which column of x identifies the spatial units (e.g. ID of weather stations).
  • timevar: String. which column of x identifies the temporal units (e.g. the day of the year).
  • k: Int64. Number of folds. If spacevar or timevar is nothing and a leave one location out or leave one time step out cv should be performed, set k to the number of unique spatial or temporal units.
  • class: String. which column of x identifies a class unit (e.g. land cover) NOT IMPLEMENTED YET!!.
  • seed: Int64 or Float64, See ?Random.seed!().

Return

cv_indices_train, cv_indices_test = spacetime_folds(x;spacevar="var1", timevar="var2", k=10, class=nothing, seed=23)

References

Meyer, H., Reudenbach, C., Hengl, T., Katurji, M., Nauß, T. (2018): Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environmental Modelling & Software 101: 1-9.

source