Header

Mapping GBIF data using R and GRASS

Author: Paulo van Breugel
Updated on: 12-08-18

Introduction

GBIF

The Global Biodiversity Information Facility (GBIF) is an international open data infrastructure that allows anyone, anywhere to access data about all types of life on Earth, shared across national boundaries via the Internet. GBIF provides a single point of access through http://www.gbif.org/ to species records shared freely by hundreds of institutions worldwide. The data accessible through GBIF relate to evidence about more than 1.6 million species, collected over three centuries of natural history exploration and including current observations from citizen scientists, researchers and automated monitoring programs.

There are various ways to import GBIF data, including directly from the website as comma delimited file (csv) and using the v.in.gbif addon for GRASS. In R there are e.g., the packages spocc and rgbif that you can use. In this tutorial we will be using the latter. In the link section some tutorials are listed that illustrate the use of spocc and other relevant R packages.

R / GRASS integration

We ultimately want to store the downloaded data in a GRASS GIS database. One can use GRASS GIS and R in conjunction in two different ways. One is to use R within a GRASS GIS session. This means that we start R (or RStudio) from the GRASS GIS command line. The other option is to use GRASS GIS within a R session. This means we connect to a GRASS GIS location/mapset from within R (or RStudio). For more details, see the GRASS GIS Wiki. The first approach is arguably the easiest, and is what we will be using below. If you just want to download and use the data in R, you can simply skip the parts about GRASS GIS.

An example - Kalmia latifolia

Mountain Laurel (Kalmia latifolia) is a mid-sized shrub that can be found in acidic forests, bluffs, bogs, along sandhill streams, and in a wide range of other habitats, nearly ubiquitous in the mountains, up to at least 1600m, while more restricted in habitat in the lower Piedmont and Coastal Plain (NC wildflower website). In this example, we will download the presence points of this species in the USA. All it really takes is 5 lines of code.

# Get the data and convert it in the right format,
# reproject it to the right projection, and export to GRASS GIS
KL <- occ_data(scientificName="Kalmia latifolia", hasCoordinate=TRUE,
		   country="US", hasGeospatialIssue=FALSE)$data
coordinates(KL) <- cbind(KL$decimalLongitude, KL$decimalLatitude)
proj4string(KL) = CRS("+init=epsg:4326")
KLnew <- spTransform(KL, getLocationProj())
writeVECT(SDF=KLnew[,c(1,2,5)], vname="KL3", driver="SQLite")

For those less familiar with R, GRASS GIS or the combination of the two, let's go into a bit more detail. Below we'll go over the code above step by step.

Import GBIF data in R

Open R and load libraries

In this example we start GRASS GIS first and then started R (or Rstudio) from the GRASS GIS command line. After opening R, we load the libraries rgbif, maptools (for displaying a map of the data in R), the raster package, and rgrass7 (to interact with GRASS GIS).

library(rgbif)
library(maptools)
library(rgrass7)
library(raster)
library(sp)

Define the extent of the region of interest

The GBIF data set, from which we will download the data later, contains data for the whole world. If you are only interested in occurrence data in a specific region, you can define the bounding box of your area on interest, as in the first example below, or provide the name of the country, as in the second example. So, suppose we are interested in the occurrence data in North Carolina. The bounding box of the state is north: 36:36:30N, south: 33:36N, west: 84:20W, and east: 75:05W. We can set this extent in R by creating a vector with the minimum and maximum X-coordinates and the minimum and maximum Y-coordinates.

# Define extent
NCext <- c(-84.333, -75.08333, 33.6, 36.60833)

We are running R from within a GRASS GIS session. If in GRASS the region extent is already correctly defined, you can extract the bounding box information by reading in the GRASS location metadata. Click here to read more...

If you want to check if the extent is correctly defined, you can plot the bounding box on top of the world maps. In the script below, in line 4, the NCext object is first converted to object of the class extent using the extent function from the raster package, and subsequently converted to a SpatialPolygon.

# Plot data
data(wrld_simpl)
plot(wrld_simpl, col = "white", axes = T)
NCbbox <- as(extent(NCext), "SpatialPolygons")
plot(NCbbox, col = "red", add=TRUE)

The map below (click to enlarge) shows in red the bounding box that was defined above.

Import the species point data in R using rgbif{rgbif}

OK, so now we have defined the extent, it is time to get the point data of our target species; Kalmia latifolia. The rgbif package has a very convenient function occ_data to do this. To start with, we need the scientific name. Then we'll need to set the parameter hasCoordinate to TRUE to ensure only records with coordinates are downloaded. The area from which we want to download the data can be set by the parameter geometry, which is set to the bounding box we had defined earlier. There are a lot more parameters to further fine-tune the search, see the help file for details (or type in ?occ_data on the command line).

KL <- occ_data(scientificName="Kalmia latifolia", hasCoordinate=TRUE, geometry=NCext,
		 hasGeospatialIssue=FALSE)

Visualizing the data in R

Mapping the data using mapr

The KL is a list with two elements. The first is a list with the metadata and the second a dataframe with the actual records, including the coordinates. We can quickly map the points in the dataframe using the map_ggplot(). This allows you to define extent of the map using the map parameter, with the following options: world, world2, state, usa, county, france, italy, or nz.

map_ggplot(KL, map="state")

The options state and county will give you a map of the USA with respectively the state and county boundaries. It thus gives you a simple world map of the USA with state boundaries and the location of Kalmia latifolia (click on map to enlarge).

The mapr package contains various other mapping functions to plot maps with a background layer from e.g., Google or Openstreetmap, and to create interactive maps. For example, the code map_gist(KL, size=2)will create the map below. Click here to read more...

Visualizing the data with ggplot2

The mapping functions in mapr are great to create a quick overview, but they don't allow you to customize the map much. If you want something more involved, you can use for example ggplot2 to create your own map. This time, we are downloading the data for the whole of the USA. Instead of the geometry parameter, we can use the country parameter, which is a convenient way to define the country of interest (see the help file for details).

# Download Kalmia latifolia presence data for the USA
KL <- occ_data(scientificName="Kalmia latifolia", hasCoordinate=TRUE, country="US",
               hasGeospatialIssue=FALSE)

# Get USA state map
NCstate = map_data("state")

# Create the map
NCstate = map_data("state")
p <- ggplot(NCstate, aes(long, lat)) + geom_polygon() + coord_equal()
p <- ggplot() + coord_fixed(1.1)
p <- p + geom_polygon(data = NCstate, aes(x=long, y = lat, group = group),
                      fill="white", col="grey", lwd=0.3)
p <- p + geom_point(data=KL$data, aes(x = decimalLongitude, y=decimalLatitude,
                      color="Presence points"), cex=0.5) + labs(color = "Kalmia latifolia")
p <- p + geom_polygon(data = NCstate, aes(x=long, y = lat, group = group),
                      fill=rgb(1,1,1,0), col=rgb(0.8,0.8,0.8,0.3), lwd=0.1)
p <- p + theme(legend.position=c(0.14,0.15))
ggsave(filename="distribution of Kalmia_latifolia in USA.png",
       plot=p, width=17, height=8.5, dpi=300, scale=1, unit="cm")

The map (click to enlarge) clearly shows that the species is largely confined to the east of the USA. I would normally check the data for consistency and possible errors. Notice e.g., the two odd points on the west coast. However, I will skip that step for now.

Import the data in GRASS GIS

Convert the data to SpatialPointsDataFrame

First step is to concert the data in a SpatialPointsDataFrame. This is done by assigning the coordinates. Next, we need to define the coordinate reference system (CRS). The GBIF data is downloaded in latlon WGS84, which as the EPGS code EPS 4326. We can use the proj4string and the CRS functions to set the projection of our SpatialPointsDataFrame.

# Extract the data.frame (i.e., remove the metadata)
KL <- KL$data

# Convert the data.frame to a spatial point dataframe.
coordinates(KL) <- cbind(KL$decimalLongitude, KL$decimalLatitude)
latlong = "+init=epsg:4326"
proj4string(KL) = CRS(latlong)

Export to the GRASS database

Next, we can use the writeVECT function from the rgrass7 package to export the point layer to the GRASS GIS database. Note, this only works if you started R from within the GRASS GIS session, as mentioned earlier. In the example below, only a few columns are exported, because I won't be needing the other ones. If you want to retain all columns, note that some columns may contain characters causing problems when importing in GRASS. In that case you will need to check and clean the offending columns.

writeVECT(SDF=KL[,c(1,2,5)], vname="KL", driver="SQLite")

Special case - reprojecting

What if the downloaded data is in another projection than the projection of the Location/Mapset in your GRASS database? The coordinate reference system (CRS) of the GBIF point data is latlon (WGS 84). However, in this tutorial we'll be working with the North Carolina (NC) sample GRASS database, which can be downloaded from the GRASS GIS website. The projection of this dataset is NAD83(HARN) / North Carolina (EPSG 3358). This means that the data needs to be reprojected. There are various ways one can do this. (A) One option is to reproject the data in R before writing it to GRASS. (B) A second option is to import the data in a location/mapset with the same projection as the point data (as we have just done above), and than change to the target Location/mapset in GRASS and use use r.proj to import and reproject the data from the first location/mapset into the target location/mapset. For an example of option B, see this tutorial.

We will go for option A here. As said, the data downloaded earlier is in latlon WGS84. We can get the projection details of our target mapset using the getLocationProj function from the rgrass7 package (Note, I am assuming that we have started GRASS with the target Location/mapset). Next, to reproject the SpatialPointsDataFrame, we use the spTransform function from the sp package (note that there is also a spTransform function in the rgdal package, which does the same). The last step is to export the data into the GRASS GIS database.

# Get the projection of the GRASS mapset and reproject the data
proj <- getLocationProj()
KLnew <- spTransform(KL, proj)
writeVECT(SDF=KLnew[,c(1,2,5)], vname="KLatifolia", driver="SQLite",
          v.in.ogr_flags="o")

You may need to use v.in.ogr_flags="o" in the writeVECT function (line 4-5). For an explanation, see this email thread.

Visualize the data in GRASS GIS

Now you can further analyse your data in GRASS GIS or create a map. GRASS offers different ways to create maps, including the ps.map and g.gui.psmap functions (command line and GUI respectively), the m.printws function (you'll need to install the addon) and the d.* family of GRASS functions. The map below was created using the latter.

The map is created on the command line using the d.* family of functions. For more details, check out the help file of each of the functions used. For the GRASS environmental variables (GRASS_RENDER_* in line 21-27), see this help page.

# Set region (we want to set the region slightly larger than the extent
# of the usa
g.region vector=usa_states res=25
g.region n=n+200000 s=s-50000 w=w-200000 e=e+50000

# Compute with/height ratio image
# Note, the bc function rounds down to the nearest integer. To round up
# to the nearest integer when computing the width, I added 1 (+1).
eval `g.region -g`
ratio=`bc <<< "scale=4; $rows / $cols"`
height=1200
width=`bc <<< "scale=0; $height / $ratio + 1"` 

# Create vector layer with an extent matching that of the region. This
# will serve as background layer
v.in.region --overwrite output=bckgrnd cat=1 --overwrite

# Set environmental variables
# The commands below will directly render the images to a file. If you want
# to render the map to a monitor, use d.mon.
export GRASS_RENDER_IMMEDIATE="cairo"
export GRASS_RENDER_FILE="Kalmia_latifolia.png"
export GRASS_RENDER_HEIGHT=$height
export GRASS_RENDER_WIDTH=$width
export GRASS_RENDER_BACKGROUNDCOLOR="#FFFFFF"
export GRASS_RENDER_TRANSPARENT="TRUE"
export GRASS_RENDER_FILE_READ="TRUE"

# Define the layers
d.vect map=bckgrnd type=area color=234:234:234 fcolor=234:234:234
d.grid -w size=5 color=255:255:255 border_color=160:160:160 fontsize=12
d.vect usa_states type=area color=100:100:100 fcolor=250:250:250
d.vect map=KLatifolia@test color=red fill_color=red icon=basic/point \
	   size=10
d.text text="Mountain Laurel" at=80,85 size=4 align=cc color="black" font=arial
d.text text="(Kalmia latifolia)" at=80,80 size=3 align=cc color="black" font=ariali

If you think, that is a whole lot of code to create a simple map, I am with you. Creating as script is a sensible choice if you want to automate your workflow. And remember, you can run the code also from within e.g., R or Python. So also within those enviroments you can create maps using GRASS functionality.

If you have to create a map only one time, it will be easier to create the map in the Map Display and use the 'save to file' option to save the resulting map as a image file. Alternatively, after creating the map in the Map Display, save the workspace (menu: File → Workspace → Save as) and use that as input in m.printws to save the map as an image.

Links

As usual, there are more than one way to skin the proverbial cat. Below some links of articles / blog posts about using different R packages to retrieve data from GBIF.

If you have questions

If you have questions or comments about the text, let me know. You can use this contact form. Please make sure to include the page title ("Mapping GBIF data using R and GRASS") or page name ("grass-r-gbif").