The package is a tool for identifying biodiversity records that have been designated locations on widely used grid systems. Our tool also estimates the degree of environmental heterogeneity associated with grid systems, allowing users to make informed decisions about how to use such occurrence data in global change studies. We show that a significant proportion (~13.5%; 261 million; accessed November 2021) of records on GBIF, largest aggregator of natural history collection data, are potentially gridded data, and demonstrate that our tool can reliably identify such records and quantify the associated uncertainties.
Purpose | New Name | Functionality |
---|---|---|
internal function | find_angle | calculate angle between two points |
internal function | find_crs_extent | automatically find the spatial extent of a coordinate reference system from https://epsg.io |
data prep | load_occ | transform a txt file (e.g. a dataset from GBIF in DarwinCore format) that has decimalLongitude and decimalLatitude into simplified spatially points |
metadata inference | infer_crs | infer the coordinate reference system in which a set of gridded coordinates were originally defined |
metadata inference | infer_resolution | estimate the resolution (or regular spacing of input coordinates) along horizontal and vertical axes, based on distance to nearest neighbors along four directions (i.e. up, down, left, right) |
metadata inference | infer_origin | infer the origin a grid system while considering the small uncertainty of the coordinates |
metadata inference | infer_extent | infer the spatial extent of a grid system, based on country polygons, coordinate reference system, or input spatial point data |
grid generation | grid_generation | generate a grid system based on user defined attributes |
grid matching | grid_matching | infer whether a coordinate is from a known grid system, based on absolute and relative distance |
grid adjustment | grid_adjustment | adjust the origin of a grid system to minimize the mismatch between grid centroids and input coordinates |
uncertainty assessment | assess_sp_uncertainty | assess the spatial uncertainty of a grid system, by calculating the distance between grid centroids toward gridded coordinates and non-gridded coordinates (e.g. iNaturalist data) |
uncertainty assessment | assess_env_uncertainty | assess the environmental uncertainty of a grid system, by calculating the variation of environmental conditions (e.g. 30m elevation) within each grid cell |
::install_github("BiogeographyLab/GridDER") remotes
Examples of four grid systems used in France (a), United Kingdom (b), South Africa (c), and Australia (d). The four grid systems have different spatial solutions – 10km for (a), 1km for (b), 5 arc-minute for (c), and 6 arc-minute for (d). The black points represent the biological collections assigned to the centroid of the corresponding grid systems. All maps constructed using WGS84.
First load the demo occurrence data. This dataset includes a group of gridded coordinates. It was originally downloaded from GBIF [https://doi.org/10.15468/dl.fp5795; accessed on 29 November 2021], but has been simplified to be spatially unique points for easier demonstration.
data("occs_unique") # dataframe of occurrences
print(nrow(occs_unique))
## [1] 1211
print(head(occs_unique))
## # A tibble: 6 × 3
## ...1 decimalLatitude decimalLongitude
## <dbl> <dbl> <dbl>
## 1 1 48.3 2.93
## 2 2 48.8 2.12
## 3 3 46.8 0.968
## 4 4 48.4 2.26
## 5 5 48.8 2.52
## 6 6 47.0 0.829
Look at the coordinates on a map
library(ggplot2)
library(broom)
data("ne_10m_admin_0_countries") # load NaturalEarth world map
= broom::tidy(ne_10m_admin_0_countries, region = "ADMIN")
spdf_fortified
::ggplot() +
ggplot2geom_polygon(data = spdf_fortified,
aes( x = long, y = lat, group = group),
fill="#69b3a2", color="white") +
geom_point(data = occs_unique,
aes(x = decimalLongitude, y = decimalLatitude),
size = 2,
shape = 23, fill = "darkred") +
coord_sf(xlim = c(0, 6), ylim = c(46, 50), expand = FALSE)
The coordinates have been projected to WGS84 (which is the same for occurrences from GBIF), which could be different from the CRS where the grid system was defined. So here we use infer_crs() to infer the correct CRS.
= GridDER::infer_crs(occ_path = occs_unique, # input of coordinates
results_crs truth_crs_num = "2154", # (optional) here we know the true CRS, just add it here for a comparison with the inferred CRS ,
#flag_debug = c(4363,5451,5981,5982,5991,5992,5425,5424,5422,3901), # (optional) only evaluate a few CRS
cup_num = 4) # the function will loop through many CRS, so using multiple cores can speed up the process. The default uses 2 cores, but you can adjust it to your preferred number.
It may take minutes - hours to complete the inference. If you want to quickly look at the output, you can load the results we calculated earlier.
= system.file("extdata", "results_crs.rds", package = "GridDER")
results_crs = readRDS(results_crs)
results_crs # Now, let's take a look at the output. The CRS code "2154" is ranked the first.
print(results_crs$selected[1:10, c("code", "note")])
## code note
## 1 2154 truth
## 1336 2154 RGF93 / Lambert-93
## 6350 5698 RGF93 / Lambert-93 + NGF-IGN69 height
## 6351 5699 RGF93 / Lambert-93 + NGF-IGN78 height
## 4355 3947 RGF93 / CC47
## 4354 3946 RGF93 / CC46
## 2944 30731 Nord Sahara 1959 / UTM zone 31N
## 3515 32431 WGS 72BE / UTM zone 31N
## 4353 3945 RGF93 / CC45
## 4352 3944 RGF93 / CC44
data("occs_unique") # Dataframe of occurrences
# Transform the coordinates to be spatial objects
= GridDER::load_occ(occs_unique) input_occ
= sp::spTransform(input_occ,crs(paste0("+init=epsg:","2154")))
input_occ_prj = GridDER::infer_resolution(input_coord=input_occ_prj@coords)
result_res print(result_res$res_x)
## [1] 10000
print(result_res$res_y)
## [1] 10000
The inferred resolution is 10km on x & y axes
= GridDER::infer_extent(method="crs_extent",
result_ext crs_grid=results_crs$selected$code[1],
flag_adjust_by_res=TRUE,
res_x=result_res$res_x,
res_y=result_res$res_y)
print(result_ext)
## [1] -380000 6000000 1320000 7230000
= GridDER::grid_generation(res_x=result_res$res_x, # resolution along x-axis
simulated_grid res_y=result_res$res_y, # resolution along y-axis
crs_num=results_crs$selected$code[1], # use the inferred CRS
country="France", # the spatial file will be masked by country
flag_maskByCountry=TRUE,
extent_unit="empirical_occ_extent",
input_extent=result_ext,# use the inferred spatial extent
flag_offset=c(0,-result_res$res_y*10,
$res_x*10,0) # adjustment of the spatial extent
result_res )
## [1] "load CountryPolygon"
## Warning in RGEOSBinPredFunc(spgeom1, spgeom2, byid, func): spgeom1 and spgeom2
## have different proj4 strings
plot(simulated_grid)
Check the simulated grid
plot(simulated_grid,
#ext=extent(c(extent(input_occ_prj)[1],extent(input_occ_prj)[1]+100000,
# extent(input_occ_prj)[3],extent(input_occ_prj)[3]+100000)))
xlim=c(extent(input_occ_prj)[1],extent(input_occ_prj)[1]+110000),
ylim=c(extent(input_occ_prj)[3],extent(input_occ_prj)[3]+110000))
plot(input_occ_prj,add=T,col="red")
# take 100 gridded points, and generate 100 random points
= input_occ[sample(1:length(input_occ),100),]
point_grid = point_grid
point_nongrid @coords=point_nongrid@coords+runif(100)
point_nongrid=rbind(point_grid,point_nongrid)
point_all
plot(input_occ)
plot(point_grid,add=T, col="red") # red is gridded coordinate
plot(point_nongrid,add=T,col="orange") # orange is random (nongridded) coordinate
We have compiled metadata for 34 grid systems, you could take a look at the table below.
<- gsheet::gsheet2tbl("https://docs.google.com/spreadsheets/d/1Qp5KOpLSVnF2t16uIbNwKwK5ZBXt4k5unONp7eopPzI/edit#gid=166945453")
meta_data
meta_data
## # A tibble: 34 × 14
## grid_ID \nISO-3166\…¹ count…² crs_t…³ crs_c…⁴ grid_…⁵ resol…⁶ resol…⁷ resol…⁸
## <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 2 FR France EPSG 2154 Grille… 10 10 km
## 2 3 FR France EPSG 2975 Grille… 10 10 km
## 3 5 NL Nether… EPSG 28992 Amersf… 1 1 km
## 4 6 BE Belgium from_s… n/a <NA> 4 4 km
## 5 7 BE Belgium from_s… n/a <NA> 1 1 km
## 6 8 CH Switze… EPSG 2056 <NA> 5 5 km
## 7 9 RU Russia EPSG 4326 <NA> 10 5 minute
## 8 10 RU Russia EPSG 4326 <NA> 30 30 minute
## 9 11 BG,GR,RO,RS,… Bulgar… EPSG 4326 eoagri… 30 30 minute
## 10 12 SE Sweden EPSG 3021 The Sw… 25 25 km
## # … with 24 more rows, 5 more variables: extent <chr>, extent_unit <chr>,
## # path_demo_occ <chr>, spatial_uncertainty_mean_grid <dbl>,
## # spatial_uncertainty_mean_nongrid <dbl>, and abbreviated variable names
## # ¹`\nISO-3166\nalpha2_https://www.geonames.org/countries/`, ²country_name,
## # ³crs_type, ⁴crs_code, ⁵grid_name, ⁶resolution_x, ⁷resolution_y,
## # ⁸resolution_unit
Here, we simulated a grid based on the spatial attributes we just inferred.
= data.frame(grid_ID=c("88"),
grid_metadata resolution_x=c(10000),
resolution_y=c(10000) )
= GridDER::grid_matching(input_occ=point_all,
temp_result input_grid=simulated_grid,
grid_metadata=grid_metadata,
flag_relativeTHD=0.1,
flag_absoluteTHD=10)
## input 200 occ
## checking 1 grid out of 1[1] 1
## Warning: PROJ support is provided by the sf and terra packages among others
# 100 points were recognized to be gridded coordinates
print( table(temp_result@data$grid_closest_both))
##
## grid_ID_88 notFound
## 100 100
plot(temp_result,col=as.factor(temp_result@data$grid_closest_both))
::ee_Initialize() # The follow command authorizes rgee to manage Earth Engine resources rgee
# Load ImageCollection of interest (environmental layer)
= rgee::ee$Image('NASA/NASADEM_HGT/001')$select('elevation') # Global elevation data in 30 meters. We do not plot due to extension and resolution
nasadem
# Grid Id 9
data("grid_ID_9") # Grid system example available in the package
= grid_ID_9 |>
grid ::sf_as_ee() rgee
## Registered S3 method overwritten by 'geojsonsf':
## method from
## print.geojson geojson
# Compute standard deviation
<- GridDER::assess_env_uncertainty(x= nasadem, y= grid) std_dev
## [1] "Extracting information [0/336]..."
## Number of features: Calculating ...
Number of features: 336
print(std_dev, n=5)
## Simple feature collection with 336 features and 12 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 38.1667 ymin: 55.0833 xmax: 43 ymax: 56.8333
## Geodetic CRS: WGS 84
## First 5 features:
## Name alttdMd begin drwOrdr dscrptn end extrude icon tessllt timstmp
## 1 ?1 NA NA NA ??????? ?1 NA 0 NA -1 NA
## 2 ?2 NA NA NA ??????? ?2 NA 0 NA -1 NA
## 3 ?3 NA NA NA ??????? ?3 NA 0 NA -1 NA
## 4 ?1 NA NA NA ??????? ?1 NA 0 NA -1 NA
## 5 ?2 NA NA NA ??????? ?2 NA 0 NA -1 NA
## visblty elevation geometry
## 1 -1 12.579737 POLYGON ((39.5 56.75, 39.66...
## 2 -1 6.765410 POLYGON ((39.6667 56.75, 39...
## 3 -1 6.164659 POLYGON ((39.8333 56.7083, ...
## 4 -1 13.234414 POLYGON ((39.3333 56.6667, ...
## 5 -1 7.550874 POLYGON ((39.5 56.6667, 39....
# Plot
library(ggplot2)
ggplot(data = grid_ID_9) +
geom_sf(aes(fill = std_dev$elevation))+
scale_fill_viridis_c(option = "plasma", trans = "sqrt")
Note the you need to setup rgee package (R package to acessess Google Earth Engine API) before using assessing the environmental variation (assess_env_uncertainty). You may need to use the following code to do so.
::ee_install() # Creating virtual environmental to install the necessary dependencies
rgee
# If the following message happen : `An error occur when ee_install was creating the Python Environment. Run ee_clean_pyenv() and restart the R session, before trying again`
# Then execute the bellow command and restart R session:
::ee_clean_pyenv()
rgee
# After that you must run `rgee::ee_install()`. You will request to continues and Rstart R seccion choose Yes in all options.
::ee_Initialize() # The follow command authorizes rgee to manage Earth Engine resources
rgee
::ee_check() # Checks if everything is OK. Optinal command . rgee
library(GridDER)
#1. (mandatory) specify input coordinates (dataframe, tibble, or the path of a csv) with columns "decimalLatitude" "decimalLongitude"
= occs_unique # a demo dataset is used here
input_occ
#2. (optional) specify the crs. flag_crs=NULL will activate the crs inferring process, specifying a EPSG code of a crs will skip the crs inferring process
= NULL
flag_crs = "2154" #a crs code that will be used as a reference, to be compared with many other crs codes; 2154 is the original crs for the demo dataset.
possible_true_crs = 15 # number of cpus to be used in the inferring process
cup_num
#3. (optional) specify the resolution of the grid along x and y, e.g. flag_res= c(1000,1000). NULL will activate the resolution inferring process
= NULL
flag_res
#4. (optional) specify the spatial extent of the grid, e.g. flag_extent= c(xmin, ymin, xmax, ymax). NULL will activate the extent inferring process
= NULL
flag_extent
#5. (optional) generate a grid based on crs, resolution, extent.
= TRUE
flag_generateGrid
#6. (optional) match input coordinates to the simulated grid
= TRUE
flag_matchOccToGrid
#7. (optional) plot input coordinates and grid
= TRUE
flag_plot
#######################
if(is.null(flag_crs)) {
#1.1 Infer coordinate reference system
= infer_crs(occ_path=input_occ,
result_crs truth_crs_num = possible_true_crs,
cup_num =cup_num)
#results_crs = readRDS(system.file("extdata", "results_crs.rds", package = "GridDER"))
= results_crs$selected$code[1]
flag_crs
}print(flag_crs)
if( is.null(flag_res) ){
= load_occ(input_occ)
temp_occ = sp::spTransform(temp_occ,crs(paste0("+init=epsg:",flag_crs)))
input_occ_prj = infer_resolution(input_coord=input_occ_prj@coords)
result_res = c(result_res$res_x,result_res$res_y)
flag_res
}print(flag_res)
if( is.null(flag_extent) ){
= infer_extent(method = "crs_extent",
result_ext crs_grid = flag_crs,
flag_adjust_by_res = TRUE,
res_x = flag_res[1],
res_y = flag_res[1])
= result_ext
flag_extent
}print(flag_extent)
if(flag_generateGrid ){
= grid_generation(res_x = flag_res[1],
simulated_grid res_y = flag_res[2],
extent_unit="empirical_occ_extent",
input_extent=result_ext,
flag_offset=c(0,-flag_res[2]*10,
1]*10,0),
flag_res[crs_num = flag_crs,
flag_maskByCountry = FALSE)
}
if(flag_matchOccToGrid){
= load_occ(input_occ)
point_grid
= data.frame(grid_ID= c("simulatedGrid"),
grid_metadata resolution_x=flag_res[1],
resolution_y=flag_res[2] )
= grid_matching(input_occ = point_grid,
temp_result input_grid = simulated_grid,
grid_metadata=grid_metadata,
flag_relativeTHD = 0.1,
flag_absoluteTHD = 100)
table(temp_result@data$grid_closest_both)
}
if(flag_plot){
= sp::spTransform(temp_result,crs(paste0("+init=epsg:",flag_crs)))
temp_result_prj = crop(simulated_grid,extent(temp_result_prj))
simulated_grid_crop plot(simulated_grid_crop )
plot(temp_result_prj,add=T,col="blue")
}