Header

Creating a histogram from vector attribute data

Author: Paulo van Breugel
Updated on: 10-08-18

Introduction

GRASS GIS has convenient tools to draw histograms of raster values. As similar tool to draw a histogram of values in a vector attribute table lacks. But you can easily add this functionality by installing the d.vect.colhist addon by Moritz Lennert.

Install the addon

You can install addons using the wxGUI Extension Manager (Menu: Settings → Addon extensions → Install extensions from addons) to install Addons. Or you can install the addon from the command line using the g.extension module.

g.extension extension=d.vect.colhist

Using the function from the command line

We'll be using the example from the helpfile. As usual, it uses data from the North Carolina data set. There is a complete and basic version of this data set. We'll be using the full data set. Open GRASS GIS and check out the layer.


Figure 1: The censusblk_swwake vector layer

The vector layer is of Wake County showing the census areas with the country. Note that you can get a short description of each data set from here. You can check out the data in the attribute table using the attribute table manager. As you can see, the attribute table contains all sort of census data for each of the census areas.


Figure 2: Select the layer and click the button to open the attribute table (1). The census data in the attribute table includes for example the total population per census block (2)

Let's now check out the distribution of the medium age over the census blocks in Wake County. Easiest is to do this on the command line using the code below.

d.vect.colhist map=censusblk_swwake column=MEDIAN_AGE

With the code, you tell the d.vect.colhist function that you want to create a histogram using the data in the column MEDIAN_AGE from the vector layercensusblk_swwake. The plot should look like the one below.


Figure 3: Plot of medium age per sensus block on Wake County

The histogram looks odd, with a very large number of census blocks with medium age 0. Checking the attribute table clarifies that these are census areas without any population. We can exclude those from the histogram with a simple SQL where statement.

d.vect.colhist map=censusblk_swwake column=MEDIAN_AGE where="TOTAL_POP>0"

For those not familiar with SQL, the statement means that only the census blocks where people live (total population is larger than 0) are used to create the histogram. The plot now looks like the one below. The where option thus allows you to select a subset of data from the attribute table.


Figure 4: Plot of distribution of the medium age in Wake County, using the numbers per census block. Only census blocks with a total population > 0 are included.

An additional option is to determine the number of bins (bars) used for the histogram and to save the result directly as a file (default is to display the graph on screen). The example below shows the use of both options.

d.vect.colhist map=censusblk_swwake@PERMANENT column=MEDIAN_AGE \
               where="TOTAL_POP>0" plot_output=graph_03.png bins=10


Figure 5: Plot of distribution of the medium age in Wake County, using the numbers per census block. Only census blocks with a total population > 0 are included.

Using the function using the GUI

Instead of using the command line, you can also use the GUI. To open it, type in d.vect.colhist on the command line or the console in the GRASS GIS layer manager. A third way is to open it from the Modules tab as shown below.


Figure 6: Open the function d.vect.colhist from the Modules tab

This will open the functions window with different tabs. In the first tab (Required) you can enter the vector name and the column with the data you want to use to create the histogram.


Figure 7: GUI of the d.vect.colhist function, with the options in the required tab.

In the second tab (Optional) you can enter a where condition to select a subset of data from the attribute table. You have to enter this manually as the little table icon next to the 'where' field does not seem to work at the time of writing) and set the number of bins.


Figure 8: GUI of the d.vect.colhist function, with the options in the optional tab.

With the settings as set in Figure 7 and 8, you'll get the same plot as in Figure 4.

Afterword

GRASS GIS is a powerfull tool for spatial analysis, which includes several tools to quickly explore raster data. For vector data there are less data exploration tools available out of the box. But as this example shows, it is worth it checking out the addons to see if anybody else created something you can use.

If you are more into Python, the integration between Python and GRASS GIS is possibly even better. See for example this post on Plotting GRASS data in Python. Actually, many addons are written in Python, including the d.vect.colhist addon. So if you know how to plot data in Python, it will only be a very small step to make your own GRASS GIS addon.

If you have questions

If you have questions or comments about the text, let me know. You can use this contact form. Please make sure to include the page title ("Creating a histogram from vector attribute data") or page name ("histograms of vector attribute data").