Code
# install.packages("igraph")
library(igraph)
Attaching package: 'igraph'
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
Nodes and Links and edges and vertices…
Barrie Robison
April 19, 2023
In this assignment, we’ll consider some of the tools and techniques for visualizing network data. Network data is characterized by two unique items that are not found in tabular or spatial data - Nodes and Links. In addition, there is a sub-type of network data that we will consider - Hierarchical or Tree data. Let’s practice a few visualizations to get a feel for how these things work!
In class we talked about the strong possibility that each of you will encounter network data in your research. These data could be describing lots of different things in biologicial syste,s, from gene or protein expression networks, to hierarchical representations of ontologies, to phylogenetic trees. In geology and geoinformatics, network data would be used to describe the co-occurence of mineral species, or even knowledge graphs and other information representations.
Let’s start with igraph, which is an open source toolset for network analysis. The great thing about igraph is that you can use these tools in R, Python, Mathematica, and C++. It is very flexible and very powerful.
First up, we’ll install R/igraph and load the library (note that I’ve commented out the package installation because I’ve already installed igraph on my machine):
Attaching package: 'igraph'
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
Now I’m going to walk you through a modified version of the igraph tutorial, which you can find here
igraph
offers many ways to create a graph. The simplest one is the function make_empty_graph
:
The most common way to create a graph is make_graph
, which constructs a network based on specified edges. For example, to make a graph with 10 nodes (numbered 1
to 10
) and two edges connecting nodes 1-2
and 1-5
:
We can print the graph to get a summary of its nodes and edges:
This means: Undirected Named graph with 10 vertices and 2 edges, with the exact edges listed out. If the graph has a [name]
attribute, it is printed as well.
The same function make_graph
can create some notable graphs by just specifying their name. For example you can create the graph that represents the social network of Zachary’s karate club, that shows the friendship between 34 members of a karate club at a US university in the 1970s:
To visualize a graph you can use plot
:
Vertices and edges have numerical vertex IDs in igraph. Vertex IDs are always consecutive and they start with 1. For a graph with n vertices the vertex IDs are always between 1 and n. If some operation changes the number of vertices in the graphs, e.g. a subgraph is created via induced_subgraph
, then the vertices are renumbered to satisfy this criterion.
The same is true for the edges as well: edge IDs are always between 1 and m, the total number of edges in the graph.
In addition to IDs, vertices and edges can be assigned a name and other attributes. That makes it easier to track them whenever the graph is altered.
Let’s continue working with the Karate club graph. To add one or more vertices to an existing graph, use add_vertices
:
Similarly, to add edges you can use add_edges
:
Edges are added by specifying the source and target vertex IDs for each edge. This call added three edges, one connecting vertices 1
and 35
, one connecting vertices 1
and 36
, and one connecting vertices 34
and 37
.
In addition to the add_vertices
and add_edges
functions, the plus operator can be used to add vertices or edges to graph. The actual operation that is performed depends on the type of the right hand side argument:
You can add a single vertex/edge at a time using add_vertex
and add_edge
.
Let us add some more vertices and edges to our graph. In igraph
we can use the magrittr
package, which provides a mechanism for chaining commands with the operator %\>%
:
IGRAPH 2293648 U--- 40 86 -- Zachary
+ attr: name (g/c)
+ edges from 2293648:
[1] 1-- 2 1-- 3 1-- 4 1-- 5 1-- 6 1-- 7 1-- 8 1-- 9 1--11 1--12
[11] 1--13 1--14 1--18 1--20 1--22 1--32 2-- 3 2-- 4 2-- 8 2--14
[21] 2--18 2--20 2--22 2--31 3-- 4 3-- 8 3--28 3--29 3--33 3--10
[31] 3-- 9 3--14 4-- 8 4--13 4--14 5-- 7 5--11 6-- 7 6--11 6--17
[41] 7--17 9--31 9--33 9--34 10--34 14--34 15--33 15--34 16--33 16--34
[51] 19--33 19--34 20--34 21--33 21--34 23--33 23--34 24--26 24--28 24--33
[61] 24--34 24--30 25--26 25--28 25--32 26--32 27--30 27--34 28--34 29--32
[71] 29--34 30--33 30--34 31--33 31--34 32--33 32--34 33--34 1--35 1--36
+ ... omitted several edges
We now have an undirected graph with 40 vertices and 86 edges. Vertex and edge IDs are always contiguous, so if you delete a vertex all subsequent vertices will be renumbered. When a vertex is renumbered, edges are not renumbered, but their source and target vertices will be. Use delete_vertices
and delete_edges
to perform these operations. For instance, to delete the edge connecting vertices 1-34
, get its ID and then delete it:
As an example, to create a broken ring:
The example above shows that you can also refer to edges with strings containing the IDs of the source and target vertices, connected by a pipe symbol |
. "10|1"
in the above example means the edge that connects vertex 10 to vertex 1. Of course you can also use the edge IDs directly, or retrieve them with the get.edge.ids
function:
In addition to make_empty_graph
, make_graph
, and make_graph_from_literal
, igraph
includes many other function to construct a graph. Some are deterministic, i.e. they produce the same graph each single time, e.g. make_tree
:
IGRAPH 419788f U--- 127 126 -- Tree
+ attr: name (g/c), children (g/n), mode (g/c)
This generates a regular tree graph with 127 vertices, each vertex having two children. No matter how many times you call make_tree
, the generated graph will always be the same if you use the same parameters:
Other functions generate graphs stochastically, i.e. they produce a different graph each time. For instance sample_grg
:
IGRAPH 511f5d0 U--- 100 479 -- Geometric random graph
+ attr: name (g/c), radius (g/n), torus (g/l)
This generates a geometric random graph: n points are chosen randomly and uniformly inside the unit square and pairs of points closer to each other than a predefined distance d are connected by an edge. If you generate GRGs with the same parameters, they will be different:
A slightly looser way to check if the graphs are equivalent is via isomorphic
. Two graphs are said to be isomorphic if they have the same number of components (vertices and edges) and maintain a one-to-one correspondence between vertices and edges, i.e., they are connected in the same way.
Checking for isomorphism can take a while for large graphs (in this case, the answer can quickly be given by checking the degree sequence of the two graphs). identical_graph
is a stricter criterion than isomorphic
: the two graphs must have the same list of vertices and edges, in exactly the same order, with same directedness, and the two graphs must also have identical graph, vertex and edge attributes.
In addition to IDs, vertex and edges can have attributes such as a name, coordinates for plotting, metadata, and weights. The graph itself can have such attributes too (e.g. a name, which will show in summary
). In a sense, every graph, vertex and edge can be used as an R namespace to store and retrieve these attributes.
To demonstrate the use of attributes, let us create a simple social network:
Each vertex represents a person, so we want to store ages, genders and types of connection between two people (is_formal
refers to whether a connection between one person or another is formal or informal, i.e. colleagues or friends). The \$
operator is a shortcut to get and set graph attributes. It is shorter and just as readable as graph_attr
and set_graph_attr
.
IGRAPH 7eef2d3 UN-- 7 9 --
+ attr: name (v/c), age (v/n), gender (v/c), is_formal (e/l)
V
and E
are the standard way to obtain a sequence of all vertices and edges, respectively. This assigns an attribute to all vertices/edges at once. Another way to generate our social network is with the use of set_vertex_attr
and set_edge_attr
and the operator %\>%
:
g <- make_graph(~ Alice-Bob:Claire:Frank, Claire-Alice:Dennis:Frank:Esther,
George-Dennis:Frank, Dennis-Esther) %>%
set_vertex_attr("age", value = c(25, 31, 18, 23, 47, 22, 50)) %>%
set_vertex_attr("gender", value = c("f", "m", "f", "m", "m", "f", "m")) %>%
set_edge_attr("is_formal", value = c(FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE))
summary(g)
To assign or modify an attribute for a single vertex/edge:
[1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
[1] TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
Attribute values can be set to any R object, but note that storing the graph in some file formats might result in the loss of complex attribute values. Vertices, edges and the graph itself can all be used to set attributes, e.g. to add a date to the graph:
To retrieve attributes, you can also use graph_attr
, vertex_attr
, and edge_attr
. To find the ID of a vertex you can use the function match
:
To assign attributes to a subset of vertices or edges, you can use:
+ 7/7 vertices, named, from 7eef2d3:
[1] Alejandra Bruno Carmina Frank Dennis Esther George
To delete attributes:
If you want to save a graph in R with all the attributes use the R’s standard function dput
function and retrieve it later with dget
. You can also just save the R workspace and restore it later.
igraph
provides a large set of functions to calculate various structural properties of graphs. It is beyond the scope of this tutorial to document all of them, hence this section will only introduce a few of them for illustrative purposes. We will work on the small social network constructed in the previous section.
Perhaps the simplest property one can think of is the degree. The degree of a vertex equals the number of edges adjacent to it. In case of directed networks, we can also define in-degree (the number of edges pointing towards the vertex) and out-degree (the number of edges originating from the vertex). igraph
is able to calculate all of them using a simple syntax:
If the graph was directed, we would have been able to calculate the in- and out-degrees separately using degree(mode="in")
and degree(mode="out")
. You can also pass a single vertex ID or a list of vertex IDs to degree
if you want to calculate the degrees for only a subset of vertices:
Most functions that accept vertex IDs also accept vertex names (i.e. the values of the name
vertex attribute) as long as the names are unique:
It also works for single vertices:
A similar syntax is used for most of the structural properties igraph
can calculate. For vertex properties, the functions accept a vertex ID, a vertex name, or a list of vertex IDs or names (and if they are omitted, the default is the set of all vertices). For edge properties, the functions accept a single edge ID or a list of edge IDs.
NOTE: For some measures, it does not make sense to calculate them only for a few vertices or edges instead of the whole graph, as it would take the same time anyway. In this case, the functions won’t accept vertex or edge IDs, but you can still restrict the resulting list later using standard operations. One such example is eigenvector centrality (evcent()
).
Besides degree, igraph includes built-in routines to calculate many other centrality properties, including vertex and edge betweenness (edge_betweenness
) or Google’s PageRank (page_rank
) just to name a few. Here we just illustrate edge betweenness:
Now we can also figure out which connections have the highest betweenness centrality:
Imagine that in a given social network, you want to find out who has the largest degree. You can do that with the tools presented so far and the which.max
function:
Another example would be to select only vertices that have only odd IDs but not even ones, using the V
function:
[1] 5
Of course, it is possible to select vertices or edges by positional indices:
Selecting a vertex that does not exist results in an error:
Attribute names can also be used as-is within the indexing brackets of V()
and E()
. This can be combined with R’s ability to use boolean vectors for indexing to obtain very concise and readable expressions to retrieve a subset of the vertex or edge set of a graph. For instance, the following command gives you the names of the individuals younger than 30 years in our social network:
Of course, <
is not the only boolean operator that can be used for this. Other possibilities include the following:
Operator | Meaning |
---|---|
== |
The attribute/property value must be equal to |
!= |
The attribute/property value must not be equal to |
< |
The attribute/property value must be less than |
<= |
The attribute/property value must be less than or equal to |
> |
The attribute/property value must be greater than |
>= |
The attribute/property value must be greater than or equal to |
%in% |
The attribute/property value must be included in |
You can also create a “not in” operator from %in%
using the Negate
function:
If an attribute has the same name as an igraph
function, you should be careful as the syntax can become a little confusing. For instance, if there is an attribute named degree
that represents the grades of an exam for each person, that should not be confused with the igraph
function that computes the degrees of vertices in a network sense:
[1] "A" "A+" "C"
Edges can be selected based on attributes just like vertices. As mentioned above, the standard way to get edges is E
. Moreover, there are a few special structural properties for selecting edges.
Using .from
allows you to filter the edge sequence based on the source vertices of the edges. E.g., to select all the edges originating from Carmina (who has vertex index 3):
+ 4/9 edges from 7eef2d3 (vertex names):
[1] Alejandra--Carmina Carmina --Frank Carmina --Dennis Carmina --Esther
Of course it also works with vertex names:
+ 4/9 edges from 7eef2d3 (vertex names):
[1] Alejandra--Carmina Carmina --Frank Carmina --Dennis Carmina --Esther
Using .to
filters edge sequences based on the target vertices. This is different from .from
if the graph is directed, while it gives the same answer for undirected graphs. Using .inc
selects only those edges that are incident on a single vertex or at least one of the vertices, irrespectively of the edge directions.
The %--%
operator can be used to select edges between specific groups of vertices, ignoring edge directions in directed graphs. For instance, the following expression selects all the edges between Carmina (vertex index 3), Dennis (vertex index 5) and Esther (vertex index 6):
+ 3/9 edges from 7eef2d3 (vertex names):
[1] Carmina--Dennis Carmina--Esther Dennis --Esther
To make the %--%
operator work with names, you can build string vectors containing the names and then use these vectors as operands. For instance, to select all the edges that connect men to women, we can do the following after re-adding the gender attribute that we deleted earlier:
The adjacency matrix is another way to represent a graph. In an adjacency matrix, rows and columns are labeled by graph vertices, and the elements of the matrix indicate the number of edges between vertices i and j. The adjacency matrix for the example graph is:
7 x 7 sparse Matrix of class "dgCMatrix"
Alejandra Bruno Carmina Frank Dennis Esther George
Alejandra . 1 1 1 . . .
Bruno 1 . . . . . .
Carmina 1 . . 1 1 1 .
Frank 1 . 1 . . . 1
Dennis . . 1 . . 1 1
Esther . . 1 . 1 . .
George . . . 1 1 . .
For example, Carmina (1, 0, 0, 1, 1, 1, 0
) is directly connected to Alejandra (who has vertex index 1), Frank (index 4), Dennis (index 5) and Esther (index 6), but not to Bruno (index 2) or to George (index 7).
A graph is an abstract mathematical object without a specific representation in 2D, 3D or any other geometric space. This means that whenever we want to visualise a graph, we have to find a mapping from vertices to coordinates in two- or three-dimensional space first, preferably in a way that is useful and/or pleasing for the eye. A separate branch of graph theory, namely graph drawing, tries to solve this problem via several graph layout algorithms. igraph implements quite a few layout algorithms and is also able to draw them onto the screen or to any output format that R itself supports.
The layout functions in igraph always start with layout
. The following table summarises them:
Method name | Algorithm description |
---|---|
layout_randomly |
Places the vertices completely randomly |
layout_in_circle |
Deterministic layout that places the vertices on a circle |
layout_on_sphere |
Deterministic layout that places the vertices evenly on the surface of a sphere |
layout_with_drl |
The Drl (Distributed Recursive Layout) algorithm for large graphs |
layout_with_fr |
Fruchterman-Reingold force-directed algorithm |
layout_with_kk |
Kamada-Kawai force-directed algorithm |
layout_with_lgl |
The LGL (Large Graph Layout) algorithm for large graphs |
layout_as_tree |
Reingold-Tilford tree layout, useful for (almost) tree-like graphs |
layout_nicely |
Layout algorithm that automatically picks one of the other algorithms based on certain properties of the graph |
Layout algorithms can be called directly with a graph as its first argument. They will return a matrix with two columns and as many rows as the number of vertices in the graph; each row will correspond to the position of a single vertex, ordered by vertex IDs. Some algorithms have a 3D variant; in this case they return three columns instead of 2.
Some layout algorithms take additional arguments; e.g., when laying out a graph as a tree, it might make sense to specify which vertex is to be placed at the root of the layout:
We can plot our imaginary social network with the Kamada-Kawai layout algorithm as follows:
This should open a new window showing a visual representation of the network. Remember that the exact placement of nodes may be different on your machine since the layout is not deterministic.
The layout
argument also accepts functions; in this case, the function will be called with the graph as its first argument. This makes it possible to just pass the name of a layout function directly, without creating a layout variable:
To improve the visuals, a trivial addition would be to color the vertices according to the gender. We should also try to place the labels slightly outside the vertices to improve readability:
You can also treat the gender
attribute as a factor and provide the colors with an argument to plot()
, which takes precedence over the color
vertex attribute. Colors will be assigned automatically to levels of a factor:
As seen above with the vertex.color
argument, you can specify visual properties as arguments to plot
instead of using vertex or edge attributes. The following plot shows the formal ties with thick lines while informal ones with thin lines:
This latter approach is preferred if you want to keep the properties of the visual representation of your graph separate from the graph itself.
In summary, there are special vertex and edge properties that correspond to the visual representation of the graph. These attributes override the default settings of igraph (i.e color, weight, name, shape,layout,etc.). The following two tables summarise the most frequently used visual attributes for vertices and edges, respectively:
Attribute name | Keyword argument | Purpose |
---|---|---|
color |
vertex.color |
Color of the vertex |
label |
vertex.label |
Label of the vertex. They will be converted to character. Specify NA to omit vertex labels. The default vertex labels are the vertex ids. |
label.cex |
vertex.label.cex |
Font size of the vertex label, interpreted as a multiplicative factor, similarly to R’s text function |
label.color |
vertex.label.color |
Color of the vertex label |
label.degree |
vertex.label.degree |
It defines the position of the vertex labels, relative to the center of the vertices. It is interpreted as an angle in radian, zero means ‘to the right’, and ‘pi’ means to the left, up is -pi/2 and down is pi/2. The default value is -pi/4 |
label.dist |
vertex.label.dist |
Distance of the vertex label from the vertex itself, relative to the vertex size |
label.family |
vertex.label.family |
Font family of the vertex, similarly to R’s text function |
label.font |
vertex.label.font |
Font within the font family of the vertex, similarly to R’s text function |
shape |
vertex.shape |
The shape of the vertex, currently “circle”, “square”, “csquare”, “rectangle”, “crectangle”, “vrectangle”, “pie” (see vertex.shape.pie), ‘sphere’, and “none” are supported, and only by the plot.igraph command. |
size |
vertex.size |
The size of the vertex, a numeric scalar or vector, in the latter case each vertex sizes may differ |
Attribute name | Keyword argument | Purpose |
---|---|---|
color |
edge.color |
Color of the edge |
curved |
edge.curved |
A numeric value specifies the curvature of the edge; zero curvature means straight edges, negative values means the edge bends clockwise, positive values the opposite. TRUE means curvature 0.5, FALSE means curvature zero |
arrow.size |
edge.arrow.size |
Currently this is a constant, so it is the same for every edge. If a vector is submitted then only the first element is used, ie. if this is taken from an edge attribute then only the attribute of the first edge is used for all arrows. |
arrow.width |
edge.arrow.width |
The width of the arrows. Currently this is a constant, so it is the same for every edge |
width |
edge.width |
Width of the edge in pixels |
label |
edge.label |
If specified, it adds a label to the edge. |
label.cex |
edge.label.cex |
Font size of the edge label, interpreted as a multiplicative factor, similarly to R’s text function |
label.color |
edge.label.color |
Color of the edge label |
label.family |
edge.label.family |
Font family of the edge, similarly to R’s text function |
label.font |
edge.label.font |
Font within the font family of the edge, similarly to R’s text function |
plot()
These settings can be specified as arguments to the plot
function to control the overall appearance of the plot.
Keyword argument | Purpose |
---|---|
layout |
The layout to be used. It can be an instance of Layout , a list of tuples containing X-Y coordinates, or the name of a layout algorithm. The default is auto , which selects a layout algorithm automatically based on the size and connectedness of the graph. |
margin |
The amount of empty space below, over, at the left and right of the plot, it is a numeric vector of length four. |
Have a look at the structure of the two dataframes (people
and relations
). igraph
is creating a graph object using the relations
data frame as an edge list. The rest of the elements are attributes of the nodes or the edges that we can use for visualization.
While these examples are illustrative, they are also a bit contrived. It is rare that you would manually make a network. It is far more common that you will use existing data to visualize a network. A big challenge is often converting your data into a format that igraph
can work with, such as an edge list or adjacency matrix.
For example, what if we wanted to use the BCB520 variable in the people
dataframe to create additional edges?
There are lots of ways to do this. Here is one:
Attaching package: 'dplyr'
The following objects are masked from 'package:igraph':
as_data_frame, groups, union
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
BCB<-people%>%
filter(BCB520 == "TRUE")%>%
select(name, BCB520)
edgelist<- combn(BCB$name, 2)
edgelist_df <- as.data.frame(t(edgelist))
colnames(edgelist_df) <- c("from", "to")
edgelist_df$BCB520<-"TRUE"
edgelist_df$Advisor<-"FALSE"
relations$BCB520 <- "FALSE"
relations3<-rbind(relations, edgelist_df)
g <- graph_from_data_frame(relations3, directed=FALSE, vertices=people)
plot(g)
Improve the network above by:
---
title: "Dubplication: Practice with Network Data"
subtitle: "Nodes and Links and edges and vertices..."
format:
html:
toc: false
echo: true
author: "Barrie Robison"
date: "2023-04-19"
categories: [Portfolio, DataViz, Network, iGraph, Assignment]
image: "wendigonetwork.png"
description: "iGRAPH!"
code-fold: true
code-tools: true
---
# OVERVIEW
In this assignment, we'll consider some of the tools and techniques for visualizing network data. Network data is characterized by two unique items that are not found in tabular or spatial data - Nodes and Links. In addition, there is a sub-type of network data that we will consider - Hierarchical or Tree data. Let's practice a few visualizations to get a feel for how these things work!
# NETWORK BASICS!
In class we talked about the strong possibility that each of you will encounter network data in your research. These data could be describing lots of different things in biologicial syste,s, from gene or protein expression networks, to hierarchical representations of ontologies, to phylogenetic trees. In geology and geoinformatics, network data would be used to describe the co-occurence of mineral species, or even knowledge graphs and other information representations.
## IGRAPH
Let's start with [igraph](https://igraph.org), which is an open source toolset for network analysis. The great thing about igraph is that you can use these tools in R, Python, Mathematica, and C++. It is very flexible and very powerful.
### igraph in R
First up, we'll install R/igraph and load the library (note that I've commented out the package installation because I've already installed igraph on my machine):
```{r}
# install.packages("igraph")
library(igraph)
```
Now I'm going to walk you through a modified version of the igraph tutorial, which you can find [here](https://r.igraph.org/articles/igraph.html)
## Creating a graph
`igraph` offers many ways to create a graph. The simplest one is the function `make_empty_graph`:
```{r}
g <- make_empty_graph()
```
The most common way to create a graph is `make_graph`, which constructs a network based on specified edges. For example, to make a graph with 10 nodes (numbered `1` to `10`) and two edges connecting nodes `1-2` and `1-5`:
```{r}
g <- make_graph(edges = c(1,2, 1,5), n=10, directed = FALSE)
```
We can print the graph to get a summary of its nodes and edges:
```{r echo = TRUE}
g
```
This means: **U**ndirected **N**amed graph with **10** vertices and **2** edges, with the exact edges listed out. If the graph has a `[name]` attribute, it is printed as well.
::: callout-note
`summary` does not list the edges, which is convenient for large graphs with millions of edges:
:::
```{r echo = TRUE}
summary(g)
```
The same function `make_graph` can create some notable graphs by just specifying their name. For example you can create the graph that represents the social network of Zachary's karate club, that shows the friendship between 34 members of a karate club at a US university in the 1970s:
```{r echo = TRUE}
g <- make_graph('Zachary')
```
To visualize a graph you can use `plot`:
```{r}
plot(g)
```
## Vertex and edge IDs
Vertices and edges have numerical vertex IDs in igraph. Vertex IDs are always consecutive and they start with 1. For a graph with n vertices the vertex IDs are always between 1 and n. If some operation changes the number of vertices in the graphs, e.g. a subgraph is created via `induced_subgraph`, then the vertices are renumbered to satisfy this criterion.
The same is true for the edges as well: edge IDs are always between 1 and m, the total number of edges in the graph.
In addition to IDs, vertices and edges can be assigned a name and other attributes. That makes it easier to track them whenever the graph is altered.
## Adding/deleting vertices and edges
Let's continue working with the Karate club graph. To add one or more vertices to an existing graph, use `add_vertices`:
```{r}
g <- add_vertices(g, 3)
```
Similarly, to add edges you can use `add_edges`:
```{r}
g <- add_edges(g, edges = c(1,35, 1,36, 34,37))
```
Edges are added by specifying the source and target vertex IDs for each edge. This call added three edges, one connecting vertices `1` and `35`, one connecting vertices `1` and `36`, and one connecting vertices `34` and `37`.
In addition to the `add_vertices` and `add_edges` functions, the plus operator can be used to add vertices or edges to graph. The actual operation that is performed depends on the type of the right hand side argument:
```{r echo = TRUE, eval=FALSE}
g <- g + edges(c(1,35, 1,36, 34,37))
```
You can add a single vertex/edge at a time using `add_vertex` and `add_edge`.
Let us add some more vertices and edges to our graph. In `igraph` we can use the `magrittr` package, which provides a mechanism for chaining commands with the operator `%\>%`:
```{r echo = TRUE}
g <- g %>% add_edges(edges=c(1,34)) %>% add_vertices(3) %>%
add_edges(edges=c(38,39, 39,40, 40,38, 40,37))
g
plot(g)
```
We now have an undirected graph with 40 vertices and 86 edges. Vertex and edge IDs are always *contiguous*, so if you delete a vertex all subsequent vertices will be renumbered. When a vertex is renumbered, edges are **not** renumbered, but their source and target vertices will be. Use `delete_vertices` and `delete_edges` to perform these operations. For instance, to delete the edge connecting vertices `1-34`, get its ID and then delete it:
```{r echo = TRUE}
get.edge.ids(g, c(1,34))
```
```{r}
g <- delete_edges(g, 82)
```
As an example, to create a broken ring:
```{r echo = TRUE}
g <- make_ring(10) %>% delete_edges("10|1")
plot(g)
```
The example above shows that you can also refer to edges with strings containing the IDs of the source and target vertices, connected by a pipe symbol `|`. `"10|1"` in the above example means the edge that connects vertex 10 to vertex 1. Of course you can also use the edge IDs directly, or retrieve them with the `get.edge.ids` function:
```{r echo = TRUE}
g <- make_ring(5)
g <- delete_edges(g, get.edge.ids(g, c(1,5, 4,5)))
plot(g)
```
## Constructing graphs
In addition to `make_empty_graph`, `make_graph`, and `make_graph_from_literal`, `igraph` includes many other function to construct a graph. Some are *deterministic*, i.e. they produce the same graph each single time, e.g. `make_tree`:
```{r echo = TRUE}
graph1 <- make_tree(127, 2, mode = "undirected")
summary(graph1)
plot(graph1)
```
This generates a regular tree graph with 127 vertices, each vertex having two children. No matter how many times you call `make_tree`, the generated graph will always be the same if you use the same parameters:
```{r}
graph2 <- make_tree(127, 2, mode = "undirected")
```
```{r echo = TRUE}
identical_graphs(graph1,graph2)
```
Other functions generate graphs *stochastically*, i.e. they produce a different graph each time. For instance `sample_grg`:
```{r echo = TRUE}
graph1 <- sample_grg(100, 0.2)
summary(graph1)
plot(graph1)
```
This generates a geometric random graph: *n* points are chosen randomly and uniformly inside the unit square and pairs of points closer to each other than a predefined distance *d* are connected by an edge. If you generate GRGs with the same parameters, they will be different:
```{r echo = TRUE}
graph2 <- sample_grg(100, 0.2)
identical_graphs(graph1, graph2)
plot(graph2)
```
A slightly looser way to check if the graphs are equivalent is via `isomorphic`. Two graphs are said to be isomorphic if they have the same number of components (vertices and edges) and maintain a one-to-one correspondence between vertices and edges, i.e., they are connected in the same way.
```{r echo = TRUE}
isomorphic(graph1, graph2)
```
Checking for isomorphism can take a while for large graphs (in this case, the answer can quickly be given by checking the degree sequence of the two graphs). `identical_graph` is a stricter criterion than `isomorphic`: the two graphs must have the same list of vertices and edges, in exactly the same order, with same directedness, and the two graphs must also have identical graph, vertex and edge attributes.
## Setting and retrieving attributes
In addition to IDs, vertex and edges can have *attributes* such as a name, coordinates for plotting, metadata, and weights. The graph itself can have such attributes too (e.g. a name, which will show in `summary`). In a sense, every graph, vertex and edge can be used as an R namespace to store and retrieve these attributes.
To demonstrate the use of attributes, let us create a simple social network:
```{r}
g <- make_graph(~ Alice-Bob:Claire:Frank, Claire-Alice:Dennis:Frank:Esther,
George-Dennis:Frank, Dennis-Esther)
```
Each vertex represents a person, so we want to store ages, genders and types of connection between two people (`is_formal` refers to whether a connection between one person or another is formal or informal, i.e. colleagues or friends). The `\$` operator is a shortcut to get and set graph attributes. It is shorter and just as readable as `graph_attr` and `set_graph_attr`.
```{r echo = TRUE}
V(g)$age <- c(25, 31, 18, 23, 47, 22, 50)
V(g)$gender <- c("f", "m", "f", "m", "m", "f", "m")
E(g)$is_formal <- c(FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)
summary(g)
```
`V` and `E` are the standard way to obtain a sequence of all vertices and edges, respectively. This assigns an attribute to *all* vertices/edges at once. Another way to generate our social network is with the use of `set_vertex_attr` and `set_edge_attr` and the operator `%\>%`:
```{r echo = TRUE, eval=FALSE}
g <- make_graph(~ Alice-Bob:Claire:Frank, Claire-Alice:Dennis:Frank:Esther,
George-Dennis:Frank, Dennis-Esther) %>%
set_vertex_attr("age", value = c(25, 31, 18, 23, 47, 22, 50)) %>%
set_vertex_attr("gender", value = c("f", "m", "f", "m", "m", "f", "m")) %>%
set_edge_attr("is_formal", value = c(FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE))
summary(g)
```
To assign or modify an attribute for a single vertex/edge:
```{r echo = TRUE}
E(g)$is_formal
E(g)$is_formal[1] <- TRUE
E(g)$is_formal
```
Attribute values can be set to any R object, but note that storing the graph in some file formats might result in the loss of complex attribute values. Vertices, edges and the graph itself can all be used to set attributes, e.g. to add a date to the graph:
```{r echo = TRUE}
g$date <- c("2022-02-11")
graph_attr(g, "date")
```
To retrieve attributes, you can also use `graph_attr`, `vertex_attr`, and `edge_attr`. To find the ID of a vertex you can use the function `match`:
```{r echo = TRUE}
match(c("George"), V(g)$name)
```
To assign attributes to a subset of vertices or edges, you can use:
```{r echo = TRUE}
V(g)$name[1:3] <- c("Alejandra", "Bruno", "Carmina")
V(g)
```
To delete attributes:
```{r echo = TRUE}
g <- delete_vertex_attr(g, "gender")
V(g)$gender
```
If you want to save a graph in R with all the attributes use the R's standard function `dput` function and retrieve it later with `dget`. You can also just save the R workspace and restore it later.
## Structural properties of graphs
`igraph` provides a large set of functions to calculate various structural properties of graphs. It is beyond the scope of this tutorial to document all of them, hence this section will only introduce a few of them for illustrative purposes. We will work on the small social network constructed in the previous section.
Perhaps the simplest property one can think of is the *degree*. The degree of a vertex equals the number of edges adjacent to it. In case of directed networks, we can also define *in-degree* (the number of edges pointing towards the vertex) and *out-degree* (the number of edges originating from the vertex). `igraph` is able to calculate all of them using a simple syntax:
```{r echo = TRUE}
degree(g)
```
If the graph was directed, we would have been able to calculate the in- and out-degrees separately using `degree(mode="in")` and `degree(mode="out")`. You can also pass a single vertex ID or a list of vertex IDs to `degree` if you want to calculate the degrees for only a subset of vertices:
```{r echo = TRUE}
degree(g, 7)
```
```{r echo = TRUE}
degree(g, v=c(3,4,5))
```
Most functions that accept vertex IDs also accept vertex *names* (i.e. the values of the `name` vertex attribute) as long as the names are unique:
```{r echo = TRUE}
degree(g, v=c("Carmina", "Frank", "Dennis"))
```
It also works for single vertices:
```{r echo = TRUE}
degree(g, "Bruno")
```
A similar syntax is used for most of the structural properties `igraph` can calculate. For vertex properties, the functions accept a vertex ID, a vertex name, or a list of vertex IDs or names (and if they are omitted, the default is the set of all vertices). For edge properties, the functions accept a single edge ID or a list of edge IDs.
------------------------------------------------------------------------
**NOTE:** For some measures, it does not make sense to calculate them only for a few vertices or edges instead of the whole graph, as it would take the same time anyway. In this case, the functions won't accept vertex or edge IDs, but you can still restrict the resulting list later using standard operations. One such example is eigenvector centrality (`evcent()`).
------------------------------------------------------------------------
Besides degree, igraph includes built-in routines to calculate many other centrality properties, including vertex and edge betweenness (`edge_betweenness`) or Google's PageRank (`page_rank`) just to name a few. Here we just illustrate edge betweenness:
```{r echo = TRUE}
edge_betweenness(g)
```
Now we can also figure out which connections have the highest betweenness centrality:
```{r echo = TRUE}
ebs <- edge_betweenness(g)
as_edgelist(g)[ebs == max(ebs), ]
```
## Querying vertices and edges based on attributes
### Selecting vertices
Imagine that in a given social network, you want to find out who has the largest degree. You can do that with the tools presented so far and the `which.max` function:
```{r echo = TRUE}
which.max(degree(g))
```
Another example would be to select only vertices that have only odd IDs but not even ones, using the `V` function:
```{r echo = TRUE}
graph <- graph.full(n=10)
only_odd_vertices <- which(V(graph)%%2==1)
length(only_odd_vertices)
```
Of course, it is possible to select vertices or edges by positional indices:
```{r echo = TRUE}
seq <- V(graph)[2, 3, 7]
seq
```
```{r echo = TRUE}
seq <- seq[1, 3] # filtering an existing vertex set
seq
```
Selecting a vertex that does not exist results in an error:
```{r echo = TRUE, eval = FALSE}
seq <- V(graph)[2, 3, 7, "foo", 3.5]
## Error in simple_vs_index(x, ii, na_ok) : Unknown vertex selected
```
Attribute names can also be used as-is within the indexing brackets of `V()` and `E()`. This can be combined with R's ability to use boolean vectors for indexing to obtain very concise and readable expressions to retrieve a subset of the vertex or edge set of a graph. For instance, the following command gives you the names of the individuals younger than 30 years in our social network:
```{r echo = TRUE}
V(g)[age < 30]$name
```
Of course, `<` is not the only boolean operator that can be used for this. Other possibilities include the following:
| Operator | Meaning |
|----------------------|--------------------------------------------------|
| `==` | The attribute/property value must be *equal to* |
| `!=` | The attribute/property value must *not be equal to* |
| `<` | The attribute/property value must be *less than* |
| `<=` | The attribute/property value must be *less than or equal to* |
| `>` | The attribute/property value must be *greater than* |
| `>=` | The attribute/property value must be *greater than or equal to* |
| `%in%` | The attribute/property value must be *included in* |
You can also create a "not in" operator from `%in%` using the `Negate` function:
```{r echo = TRUE}
`%notin%` <- Negate(`%in%`)
```
If an attribute has the same name as an `igraph` function, you should be careful as the syntax can become a little confusing. For instance, if there is an attribute named `degree` that represents the grades of an exam for each person, that should not be confused with the `igraph` function that computes the degrees of vertices in a network sense:
```{r echo = TRUE}
V(g)$degree <- c("A", "B", "B+", "A+", "C", "A", "B")
V(g)$degree[degree(g) == 3]
```
```{r echo = TRUE}
V(g)$name[degree(g) == 3]
```
### Selecting edges
Edges can be selected based on attributes just like vertices. As mentioned above, the standard way to get edges is `E`. Moreover, there are a few special structural properties for selecting edges.
Using `.from` allows you to filter the edge sequence based on the source vertices of the edges. E.g., to select all the edges originating from Carmina (who has vertex index 3):
```{r echo = TRUE, warning = FALSE}
E(g)[.from(3)]
```
Of course it also works with vertex names:
```{r echo = TRUE, warning = FALSE}
E(g)[.from("Carmina")]
```
Using `.to` filters edge sequences based on the target vertices. This is different from `.from` if the graph is directed, while it gives the same answer for undirected graphs. Using `.inc` selects only those edges that are incident on a single vertex or at least one of the vertices, irrespectively of the edge directions.
The `%--%` operator can be used to select edges between specific groups of vertices, ignoring edge directions in directed graphs. For instance, the following expression selects all the edges between Carmina (vertex index 3), Dennis (vertex index 5) and Esther (vertex index 6):
```{r echo = TRUE}
E(g) [ 3:5 %--% 5:6 ]
```
To make the `%--%` operator work with names, you can build string vectors containing the names and then use these vectors as operands. For instance, to select all the edges that connect men to women, we can do the following after re-adding the gender attribute that we deleted earlier:
```{r}
V(g)$gender <- c("f", "m", "f", "m", "m", "f", "m")
```
```{r echo = TRUE}
men <- V(g)[gender == "m"]$name
men
```
```{r echo = TRUE}
women <- V(g)[gender == "f"]$name
women
```
```{r echo = TRUE}
E(g)[men %--% women]
```
## Treating a graph as an adjacency matrix
The adjacency matrix is another way to represent a graph. In an adjacency matrix, rows and columns are labeled by graph vertices, and the elements of the matrix indicate the number of edges between vertices *i* and *j*. The adjacency matrix for the example graph is:
```{r echo = TRUE}
get.adjacency(g)
```
For example, Carmina (`1, 0, 0, 1, 1, 1, 0`) is directly connected to Alejandra (who has vertex index 1), Frank (index 4), Dennis (index 5) and Esther (index 6), but not to Bruno (index 2) or to George (index 7).
## Layouts and plotting
A graph is an abstract mathematical object without a specific representation in 2D, 3D or any other geometric space. This means that whenever we want to visualise a graph, we have to find a mapping from vertices to coordinates in two- or three-dimensional space first, preferably in a way that is useful and/or pleasing for the eye. A separate branch of graph theory, namely graph drawing, tries to solve this problem via several graph layout algorithms. igraph implements quite a few layout algorithms and is also able to draw them onto the screen or to any output format that R itself supports.
### Layout algorithms
The layout functions in igraph always start with `layout`. The following table summarises them:
| Method name | Algorithm description |
|-----------------|-------------------------------------------------------|
| `layout_randomly` | Places the vertices completely randomly |
| `layout_in_circle` | Deterministic layout that places the vertices on a circle |
| `layout_on_sphere` | Deterministic layout that places the vertices evenly on the surface of a sphere |
| `layout_with_drl` | The Drl (Distributed Recursive Layout) algorithm for large graphs |
| `layout_with_fr` | Fruchterman-Reingold force-directed algorithm |
| `layout_with_kk` | Kamada-Kawai force-directed algorithm |
| `layout_with_lgl` | The LGL (Large Graph Layout) algorithm for large graphs |
| `layout_as_tree` | Reingold-Tilford tree layout, useful for (almost) tree-like graphs |
| `layout_nicely` | Layout algorithm that automatically picks one of the other algorithms based on certain properties of the graph |
Layout algorithms can be called directly with a graph as its first argument. They will return a matrix with two columns and as many rows as the number of vertices in the graph; each row will correspond to the position of a single vertex, ordered by vertex IDs. Some algorithms have a 3D variant; in this case they return three columns instead of 2.
```{r}
layout <- layout_with_kk(g)
```
Some layout algorithms take additional arguments; e.g., when laying out a graph as a tree, it might make sense to specify which vertex is to be placed at the root of the layout:
```{r}
layout <- layout_as_tree(g, root = 2)
```
### Drawing a graph using a layout
We can plot our imaginary social network with the Kamada-Kawai layout algorithm as follows:
```{r}
layout <- layout_nicely(g)
```
```{r}
plot(g, layout = layout, main = "Social network with the Kamada-Kawai layout algorithm")
```
This should open a new window showing a visual representation of the network. Remember that the exact placement of nodes may be different on your machine since the layout is not deterministic.
The `layout` argument also accepts functions; in this case, the function will be called with the graph as its first argument. This makes it possible to just pass the name of a layout function directly, without creating a layout variable:
```{r}
plot(g, layout = layout_with_fr,
main = "Social network with the Fruchterman-Reingold layout algorithm")
```
To improve the visuals, a trivial addition would be to color the vertices according to the gender. We should also try to place the labels slightly outside the vertices to improve readability:
```{r}
V(g)$color <- ifelse(V(g)$gender == "m", "yellow", "red")
plot(g, layout = layout, vertex.label.dist = 3.5,
main = "Social network - with genders as colors")
```
You can also treat the `gender` attribute as a factor and provide the colors with an argument to `plot()`, which takes precedence over the `color` vertex attribute. Colors will be assigned automatically to levels of a factor:
```{r}
plot(g, layout=layout, vertex.label.dist=3.5, vertex.color=as.factor(V(g)$gender))
```
As seen above with the `vertex.color` argument, you can specify visual properties as arguments to `plot` instead of using vertex or edge attributes. The following plot shows the formal ties with thick lines while informal ones with thin lines:
```{r}
plot(g, layout=layout, vertex.label.dist=3.5, vertex.size=20,
vertex.color=ifelse(V(g)$gender == "m", "yellow", "red"),
edge.width=ifelse(E(g)$is_formal, 5, 1))
```
This latter approach is preferred if you want to keep the properties of the visual representation of your graph separate from the graph itself.
In summary, there are special vertex and edge properties that correspond to the visual representation of the graph. These attributes override the default settings of igraph (i.e color, weight, name, shape,layout,etc.). The following two tables summarise the most frequently used visual attributes for vertices and edges, respectively:
### Vertex attributes controlling graph plots
| Attribute name | Keyword argument | Purpose |
|----------------------|----------------------|-----------------------------|
| `color` | `vertex.color` | Color of the vertex |
| `label` | `vertex.label` | Label of the vertex. They will be converted to character. Specify NA to omit vertex labels. The default vertex labels are the vertex ids. |
| `label.cex` | `vertex.label.cex` | Font size of the vertex label, interpreted as a multiplicative factor, similarly to R's `text` function |
| `label.color` | `vertex.label.color` | Color of the vertex label |
| `label.degree` | `vertex.label.degree` | It defines the position of the vertex labels, relative to the center of the vertices. It is interpreted as an angle in radian, zero means 'to the right', and 'pi' means to the left, up is -pi/2 and down is pi/2. The default value is -pi/4 |
| `label.dist` | `vertex.label.dist` | Distance of the vertex label from the vertex itself, relative to the vertex size |
| `label.family` | `vertex.label.family` | Font family of the vertex, similarly to R's `text` function |
| `label.font` | `vertex.label.font` | Font within the font family of the vertex, similarly to R's `text` function |
| `shape` | `vertex.shape` | The shape of the vertex, currently "circle", "square", "csquare", "rectangle", "crectangle", "vrectangle", "pie" (see vertex.shape.pie), 'sphere', and "none" are supported, and only by the plot.igraph command. |
| `size` | `vertex.size` | The size of the vertex, a numeric scalar or vector, in the latter case each vertex sizes may differ |
### Edge attributes controlling graph plots
| Attribute name | Keyword argument | Purpose |
|-------------------------|-----------------------------|-------------------|
| `color` | `edge.color` | Color of the edge |
| `curved` | `edge.curved` | A numeric value specifies the curvature of the edge; zero curvature means straight edges, negative values means the edge bends clockwise, positive values the opposite. TRUE means curvature 0.5, FALSE means curvature zero |
| `arrow.size` | `edge.arrow.size` | Currently this is a constant, so it is the same for every edge. If a vector is submitted then only the first element is used, ie. if this is taken from an edge attribute then only the attribute of the first edge is used for all arrows. |
| `arrow.width` | `edge.arrow.width` | The width of the arrows. Currently this is a constant, so it is the same for every edge |
| `width` | `edge.width` | Width of the edge in pixels |
| `label` | `edge.label` | If specified, it adds a label to the edge. |
| `label.cex` | `edge.label.cex` | Font size of the edge label, interpreted as a multiplicative factor, similarly to R's `text` function |
| `label.color` | `edge.label.color` | Color of the edge label |
| `label.family` | `edge.label.family` | Font family of the edge, similarly to R's `text` function |
| `label.font` | `edge.label.font` | Font within the font family of the edge, similarly to R's `text` function |
### Generic arguments of `plot()`
These settings can be specified as arguments to the `plot` function to control the overall appearance of the plot.
| Keyword argument | Purpose |
|--------------------------------|----------------------------------------|
| `layout` | The layout to be used. It can be an instance of `Layout`, a list of tuples containing X-Y coordinates, or the name of a layout algorithm. The default is `auto`, which selects a layout algorithm automatically based on the size and connectedness of the graph. |
| `margin` | The amount of empty space below, over, at the left and right of the plot, it is a numeric vector of length four. |
# A NON BORING EXAMPLE!
```{r}
library(readxl)
people<-read_xlsx("Networks.xlsx", sheet = "People")
relations<-read_xlsx("Networks.xlsx", sheet ="Relations")
g <- graph_from_data_frame(relations, directed=FALSE, vertices=people)
plot(g)
```
Have a look at the structure of the two dataframes (`people` and `relations`). `igraph` is creating a graph object using the `relations` data frame as an edge list. The rest of the elements are attributes of the nodes or the edges that we can use for visualization.
While these examples are illustrative, they are also a bit contrived. It is rare that you would manually make a network. It is far more common that you will use existing data to visualize a network. A big challenge is often converting your data into a format that `igraph` can work with, such as an edge list or adjacency matrix.
For example, what if we wanted to use the BCB520 variable in the `people` dataframe to create additional edges?
There are lots of ways to do this. Here is one:
```{r}
library(dplyr)
BCB<-people%>%
filter(BCB520 == "TRUE")%>%
select(name, BCB520)
edgelist<- combn(BCB$name, 2)
edgelist_df <- as.data.frame(t(edgelist))
colnames(edgelist_df) <- c("from", "to")
edgelist_df$BCB520<-"TRUE"
edgelist_df$Advisor<-"FALSE"
relations$BCB520 <- "FALSE"
relations3<-rbind(relations, edgelist_df)
g <- graph_from_data_frame(relations3, directed=FALSE, vertices=people)
plot(g)
```
## ASSIGNMENT
Improve the network above by:
1. Coloring the edges according to Advisor / BCB520 attribute.
2. Coloring the nodes according to Department.
3. Adujsting the labels to improve readability.