# 1 Introduction

The package is a tool for identifying biodiversity records that have been designated locations on widely used grid systems. Our tool also estimates the degree of environmental heterogeneity associated with grid systems, allowing users to make informed decisions about how to use such occurrence data in global change studies. We show that a significant proportion (~13.5%; 261 million; accessed November 2021) of records on GBIF, largest aggregator of natural history collection data, are potentially gridded data, and demonstrate that our tool can reliably identify such records and quantify the associated uncertainties.

# 3 Overview of functions in GridDER

Purpose New Name Functionality
internal function find_angle calculate angle between two points
internal function find_crs_extent automatically find the spatial extent of a coordinate reference system from https://epsg.io
data prep load_occ transform a txt file (e.g. a dataset from GBIF in DarwinCore format) that has decimalLongitude and decimalLatitude into simplified spatially points
metadata inference infer_crs infer the coordinate reference system in which a set of gridded coordinates were originally defined
metadata inference infer_resolution estimate the resolution (or regular spacing of input coordinates) along horizontal and vertical axes, based on distance to nearest neighbors along four directions (i.e. up, down, left, right)
metadata inference infer_origin infer the origin a grid system while considering the small uncertainty of the coordinates
metadata inference infer_extent infer the spatial extent of a grid system, based on country polygons, coordinate reference system, or input spatial point data
grid generation grid_generation generate a grid system based on user defined attributes
grid matching grid_matching infer whether a coordinate is from a known grid system, based on absolute and relative distance
grid adjustment grid_adjustment adjust the origin of a grid system to minimize the mismatch between grid centroids and input coordinates
uncertainty assessment assess_sp_uncertainty assess the spatial uncertainty of a grid system, by calculating the distance between grid centroids toward gridded coordinates and non-gridded coordinates (e.g. iNaturalist data)
uncertainty assessment assess_env_uncertainty assess the environmental uncertainty of a grid system, by calculating the variation of environmental conditions (e.g. 30m elevation) within each grid cell

# 4 Install the GridDER package

remotes::install_github("BiogeographyLab/gridder")

# 5 Examples of grid systems

Examples of four grid systems used in France (a), United Kingdom (b), South Africa (c), and Australia (d). The four grid systems have different spatial solutions – 10km for (a), 1km for (b), 5 arc-minute for (c), and 6 arc-minute for (d). The black points represent the biological collections assigned to the centroid of the corresponding grid systems. All maps constructed using WGS84.

# 6 Major use cases

## 6.1 Infer CRS (coordinate reference system)

First load the demo occurrence data. This dataset includes a group of gridded coordinates. It was originally downloaded from GBIF [https://doi.org/10.15468/dl.fp5795; accessed on 29 November 2021], but has been simplified to be spatially unique points for easier demonstration.

data("occs_unique") # dataframe of occurrences
print(nrow(occs_unique))
## [1] 1211
print(head(occs_unique))
## # A tibble: 6 × 3
##    ...1 decimalLatitude decimalLongitude
##   <dbl>           <dbl>            <dbl>
## 1     1            48.3            2.93
## 2     2            48.8            2.12
## 3     3            46.8            0.968
## 4     4            48.4            2.26
## 5     5            48.8            2.52
## 6     6            47.0            0.829

Look at the coordinates on a map

library(ggplot2)
library(broom)

ggplot2::ggplot() +
geom_polygon(data = spdf_fortified,
aes( x = long, y = lat, group = group),
fill="#69b3a2", color="white") +
geom_point(data = occs_unique,
aes(x = decimalLongitude, y = decimalLatitude),
size = 2,
shape = 23, fill = "darkred") +
coord_sf(xlim = c(0, 6), ylim = c(46, 50), expand = FALSE)

The coordinates have been projected to WGS84 (which is the same for occurrences from GBIF), which could be different from the CRS where the grid system was defined. So here we use infer_crs() to infer the correct CRS.

results_crs = gridder::infer_crs(occ_path = occs_unique, # input of coordinates
truth_crs_num = "2154", # (optional) here we know the true CRS, just add it here for a comparison with the inferred CRS ,
#flag_debug = c(4363,5451,5981,5982,5991,5992,5425,5424,5422,3901), # (optional) only evaluate a few CRS
cup_num = 4) # the function will loop through many CRS, so using multiple cores can speed up the process. The default uses 2 cores, but you can adjust it to your preferred number. 

It may take minutes - hours to complete the inference. If you want to quickly look at the output, you can load the results we calculated earlier.

results_crs = system.file("extdata", "results_crs.rds", package = "gridder")
# Now, let's take a look at the output. The CRS code "2154" is ranked the first.
print(results_crs$selected[1:10, c("code", "note")]) ## code note ## 1 2154 truth ## 1336 2154 RGF93 / Lambert-93 ## 6350 5698 RGF93 / Lambert-93 + NGF-IGN69 height ## 6351 5699 RGF93 / Lambert-93 + NGF-IGN78 height ## 4355 3947 RGF93 / CC47 ## 4354 3946 RGF93 / CC46 ## 2944 30731 Nord Sahara 1959 / UTM zone 31N ## 3515 32431 WGS 72BE / UTM zone 31N ## 4353 3945 RGF93 / CC45 ## 4352 3944 RGF93 / CC44 ## 6.2 Infer the spatial resolution data("occs_unique") # Dataframe of occurrences # Transform the coordinates to be spatial objects input_occ = gridder::load_occ(occs_unique) input_occ_prj = sp::spTransform(input_occ,crs(paste0("+init=epsg:","2154"))) result_res = gridder::infer_resolution(input_coord=input_occ_prj@coords,flag_unit="meter") print(result_res$res_x)
## [1] 10000
res_y=result_res$res_y) print(result_ext) ## [1] -380000 6090000 1210000 7180000 ## 6.4 Generation of a grid system based on its spatial attributes simulated_grid = gridder::grid_generation(res_x=result_res$res_x, # resolution along x-axis
res_y=result_res$res_y, # resolution along y-axis unit="m", # unit of resolution flag_crs=TRUE, # user provides a CRS code crs_num=results_crs$selected$code[1], # use the inferred CRS country="France", # the spatial file will be masked by country flag_maskByCountry=TRUE, extent_unit="empirical_occ_extent", input_extent=result_ext,# use the inferred spatial extent flag_offset=c(0,-result_res$res_y*10,
result_res$res_x*10,0) # adjustment of the spatial extent ) ## [1] 10000 ## [1] 10000 ## [1] "m" ## [1] -380000 6190000 1210000 7180000 plot(simulated_grid) Check the simulated grid plot(simulated_grid, #ext=extent(c(extent(input_occ_prj)[1],extent(input_occ_prj)[1]+100000, # extent(input_occ_prj)[3],extent(input_occ_prj)[3]+100000))) xlim=c(extent(input_occ_prj)[1],extent(input_occ_prj)[1]+110000), ylim=c(extent(input_occ_prj)[3],extent(input_occ_prj)[3]+110000)) plot(input_occ_prj,add=T,col="red") ## 6.5 Match occ data set against known grid systems # take 100 gridded points, and generate 100 random points point_grid = input_occ[sample(1:length(input_occ),100),] point_nongrid = point_grid point_nongrid@coords=point_nongrid@coords+runif(100) point_all=rbind(point_grid,point_nongrid) plot(input_occ) plot(point_grid,add=T, col="red") # red is gridded coordinate plot(point_nongrid,add=T,col="orange") # orange is random (nongridded) coordinate ### 6.5.1 Grid metadata We have compiled metadata for 34 grid systems, you could take a look at the table below. meta_data <- gsheet::gsheet2tbl("https://docs.google.com/spreadsheets/d/1Qp5KOpLSVnF2t16uIbNwKwK5ZBXt4k5unONp7eopPzI/edit#gid=166945453") meta_data ## # A tibble: 34 × 14 ## grid_ID \nISO-3166\…¹ count…² crs_t…³ crs_c…⁴ grid_…⁵ resol…⁶ resol…⁷ resol…⁸ ## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> ## 1 2 FR France EPSG 2154 Grille… 10 10 km ## 2 3 FR France EPSG 2975 Grille… 10 10 km ## 3 5 NL Nether… EPSG 28992 Amersf… 1 1 km ## 4 6 BE Belgium from_s… n/a <NA> 4 4 km ## 5 7 BE Belgium from_s… n/a <NA> 1 1 km ## 6 8 CH Switze… EPSG 2056 <NA> 5 5 km ## 7 9 RU Russia EPSG 4326 <NA> 10 5 minute ## 8 10 RU Russia EPSG 4326 <NA> 30 30 minute ## 9 11 BG,GR,RO,RS,… Bulgar… EPSG 4326 eoagri… 30 30 minute ## 10 12 SE Sweden EPSG 3021 The Sw… 25 25 km ## # … with 24 more rows, 5 more variables: extent <chr>, extent_unit <chr>, ## # path_demo_occ <chr>, spatial_uncertainty_mean_grid <dbl>, ## # spatial_uncertainty_mean_nongrid <dbl>, and abbreviated variable names ## # ¹​\nISO-3166\nalpha2_https://www.geonames.org/countries/, ²​country_name, ## # ³​crs_type, ⁴​crs_code, ⁵​grid_name, ⁶​resolution_x, ⁷​resolution_y, ## # ⁸​resolution_unit Here, we simulated a grid based on the spatial attributes we just inferred. grid_metadata = data.frame(grid_ID=c("88"), resolution_x=c(10000), resolution_y=c(10000), resolution_unit=c("m")) temp_result = gridder::grid_matching(input_occ=point_all, input_grid=simulated_grid, grid_metadata=grid_metadata, flag_relativeTHD=0.1, flag_absoluteTHD=10) ## input 200 occ ## checking 1 grid out of 1[1] 1 # 100 points were recognized to be gridded coordinates print( table(temp_result@data$grid_closest_both))
##
## grid_ID_88   notFound
##        100        100
plot(temp_result,col=as.factor(temp_result@data$grid_closest_both)) ## 6.6 Compute variation of environmental conditions rgee::ee_Initialize() # The follow command authorizes rgee to manage Earth Engine resources # Load ImageCollection of interest (environmental layer) nasadem = rgee::ee$Image('NASA/NASADEM_HGT/001')$select('elevation') # Global elevation data in 30 meters. We do not plot due to extension and resolution # Grid Id 9 data("grid_ID_9") # Grid system example available in the package grid = grid_ID_9 |> rgee::sf_as_ee() ## Registered S3 method overwritten by 'geojsonsf': ## method from ## print.geojson geojson # Compute standard deviation std_dev <- gridder::assess_env_uncertainty(x= nasadem, y= grid) ## [1] "Extracting information [0/336]..." ## Number of features: Calculating ... Number of features: 336 print(std_dev, n=5) ## Simple feature collection with 336 features and 12 fields ## Geometry type: POLYGON ## Dimension: XY ## Bounding box: xmin: 38.1667 ymin: 55.0833 xmax: 43 ymax: 56.8333 ## Geodetic CRS: WGS 84 ## First 5 features: ## Name alttdMd begin drwOrdr dscrptn end extrude icon tessllt timstmp ## 1 ?1 NA NA NA ??????? ?1 NA 0 NA -1 NA ## 2 ?2 NA NA NA ??????? ?2 NA 0 NA -1 NA ## 3 ?3 NA NA NA ??????? ?3 NA 0 NA -1 NA ## 4 ?1 NA NA NA ??????? ?1 NA 0 NA -1 NA ## 5 ?2 NA NA NA ??????? ?2 NA 0 NA -1 NA ## visblty elevation geometry ## 1 -1 12.579737 POLYGON ((39.5 56.75, 39.66... ## 2 -1 6.765410 POLYGON ((39.6667 56.75, 39... ## 3 -1 6.164659 POLYGON ((39.8333 56.7083, ... ## 4 -1 13.234414 POLYGON ((39.3333 56.6667, ... ## 5 -1 7.550874 POLYGON ((39.5 56.6667, 39.... # Plot library(ggplot2) ggplot(data = grid_ID_9) + geom_sf(aes(fill = std_dev$elevation))+
scale_fill_viridis_c(option = "plasma", trans = "sqrt")

Note the you need to setup rgee package (R package to acessess Google Earth Engine API) before using assessing the environmental variation (assess_env_uncertainty). You may need to use the following code to do so.

rgee::ee_install() # Creating virtual environmental to install the necessary dependencies

# If the following message happen : An error occur when ee_install was creating the Python Environment. Run ee_clean_pyenv() and restart the R session, before trying again

# Then execute the bellow command and restart R session:

rgee::ee_clean_pyenv()

# After that you must run rgee::ee_install(). You will request to continues and Rstart R seccion choose Yes in all options.

rgee::ee_Initialize() # The follow command authorizes rgee to manage Earth Engine resources

rgee::ee_check() # Checks if everything is OK. Optinal command .