Introduction to Networks in R

Harrison Fried, Shane A. Scaggs

Categories: methods
Tags: matrix edgelist nodes edges igraph statnet tidygraph visualization graphs

In this meeting, we cover the basics of working with networks in R. By the end of the workshop, you will be acquainted with three packages – igraph, statnet, and tidygraph – and you will be able to create a network from scratch, turn it into a network object, create vertex and edge attributes, and genereate a basic visualization. Let’s get started.

Data Structures

Before we dive into network objects, let’s understand the data structures that are used to represent relational data. We will focus on matrices and edgelists, as these are among the most common structures used for network data.

The Square Matrix

i j k
i 0 0 0
j 0 0 0
k 0 0 0

One way to represent a network is using a square matrix where every node in the network has its own row and column. To do this, let’s create a vector of three nodes i, j, and k.

v <- letters[9:11]
v
## [1] "i" "j" "k"

Now, we can create a matrix in R using the index vector v.

M <- matrix(c(0,0,0,
              0,0,0,
              0,0,0), 
            nrow = length(v), 
            dimnames = list(v,v))
M
##   i j k
## i 0 0 0
## j 0 0 0
## k 0 0 0

Every cell in the matrix M represents a relationship between the two nodes – these are the network edges. A value of 0 means that no edge is present. So in this matrix, there aren’t any edges. But let’s add an edge from i to j and from i to k.

M[1,2] <- 1
M[1,3] <- 1
M
##   i j k
## i 0 1 1
## j 0 0 0
## k 0 0 0

This is the conventional way of representing directed edges between nodes. We can represent a reciprocal edge from k to i as well. In an undirected network, this information is redundant.

M[3,1] <- 1
M
##   i j k
## i 0 1 1
## j 0 0 0
## k 1 0 0

Cells along the diagonal are known as loops. For example, we can an edge from j to j along the diagonal.

M[2,2] <- 1
M
##   i j k
## i 0 1 1
## j 0 1 0
## k 1 0 0

You can always extract the diagonal using diag(). This is also a handy way to set the diagonal to 0 if your network does not contain loops.

diag(M) <- 0

The Edgelist

Sender Receiver
i k
j j
k i

An edgelist is a more compact way to store network data. This is because it does not include the 0s in that are shown in the matrix. Instead an edgelist only contains rows for each of the edges that are present in the network. This may seem trivial for a small network of 3 nodes, but when you work with a large network with hundreds or thousands of rows, eliminating all of those zeros is very convenient.

To construct an edgelist, stack rows on top of one another using the rbind function.

E <- rbind(c('i','k'),
           c('j','j'),
           c('k','i'))

You can think of the first column as a sender column. The second column is a receiver column. This is how directionality is represented in the edge list.

If you want to add edges to an edgelist, simply bind a new row to the current edgelist.

E <- rbind(E, c('i','j'))
E
##      [,1] [,2]
## [1,] "i"  "k" 
## [2,] "j"  "j" 
## [3,] "k"  "i" 
## [4,] "i"  "j"

Use the same bracket [] notation to delete rows.

E[-4,] 
##      [,1] [,2]
## [1,] "i"  "k" 
## [2,] "j"  "j" 
## [3,] "k"  "i"

But Why?

You might be wondering… why do we need to know these details about data structure? Here are two reasons.

First, whether you collect your own network data or receive data from someone else, you’ll need to store/wrangle the data into one of these formats. When you collect your own data, you can store it in this format right away. But if you receive a network you did not collect, knowing these data structures will help you understand what you are looking at.

Second, understand data structure opens the door to simulation. Put differently, you don’t need to collect data to begin learning about and visualizing networks. You can simulate synthetic data. Here is a brief example.

Bernouli Graph Simulation

In our example matrix above there were 3 nodes and a matrix with 9 possible edges. These means that for \(N\) nodes, there are \(N^2\) possible edges. We can use this information to simulate a random network.

set.seed(777)
# How many nodes? 
N <- 7 

# How many edges? 
N_edges <- N^2

# Use binomial distribution to simulate edges (coin flip)
simM <- matrix(rbinom(N_edges, 
                   size=1,
                   prob=0.5),
            nrow = N)
# No loops
diag(simM) <- 0
simM
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    0    0    1    1    1    0    1
## [2,]    0    0    0    1    1    1    0
## [3,]    0    0    0    0    1    0    0
## [4,]    1    1    1    0    1    1    1
## [5,]    1    1    0    1    0    1    1
## [6,]    0    1    0    0    0    0    1
## [7,]    0    1    0    1    0    0    0

We can use a similar procedure to simulate an edgelist. We just need to create a third column to simulate the edges, then filter out the zeros.

set.seed(777)
N <- letters[1:5]
simE <- expand.grid(N,N)

simE$Edge <- rbinom(nrow(simE), 
                 size = 1, 
                 prob = 0.5)

simE <- simE[!simE$Edge == 0 & !simE$Var1 == simE$Var2, 1:2]
head(simE)
##    Var1 Var2
## 4     d    a
## 5     e    a
## 9     d    b
## 11    a    c
## 12    b    c
## 14    d    c

Network objects in igraph

Now we will turn these data structures into igraph network objects. First, install and load the igraph package.

install.packages('igraph')
library(igraph)

First let’s create some attributes for our 3 node network. They don’t need to be fancy; just some sizes and colors.

att <- data.frame(
    name = c("i","j","k"), 
    size = c(20,27,34),
    color = c('tomato', 
              'cornflowerblue', 
              'darkorchid')
)
att
##   name size          color
## 1    i   20         tomato
## 2    j   27 cornflowerblue
## 3    k   34     darkorchid

Matrix \(\rightarrow\) network

To create a network object using a matrix, we use the graph.adjacency function. In graph theory, relational matrices are often referred to as adjacency matrices. Use the M matrix that we created before.

gM <- graph.adjacency(M)
gM
## IGRAPH 67cb420 DN-- 3 3 -- 
## + attr: name (v/c)
## + edges from 67cb420 (vertex names):
## [1] i->j i->k k->i

When we call the object, we receive a summary of the network. DN means this is a directed network with 3 nodes and 3 edges. There is just one attributes, the name of the vertices. We can tell that this is a categorical vertex attribute by the notation (v/c). The edges are listen at the bottom.

Manually add vertex attributes to the network gM from out att data frame.

gM <- set_vertex_attr(gM, name = 'size', value = att$size)
gM <- set_vertex_attr(gM, name = 'color', value = att$color)
gM
## IGRAPH 67cb420 DN-- 3 3 -- 
## + attr: name (v/c), size (v/n), color (v/c)
## + edges from 67cb420 (vertex names):
## [1] i->j i->k k->i

Edgelist \(\rightarrow\) network

To create an object using an edgelist, we use the graph.data.frame function. Using edgelists is nice because you can add the vertex attribute table in directly.

gE <- graph.data.frame(E, vertices = att)
gE
## IGRAPH 67dcc3a DN-- 3 4 -- 
## + attr: name (v/c), size (v/n), color (v/c)
## + edges from 67dcc3a (vertex names):
## [1] i->k j->j k->i i->j

Once you have an igraph network object, you can extract vertex and edge attributes.

V(gE)$color
## [1] "tomato"         "cornflowerblue" "darkorchid"

We don’t have any edge attributes in this network, but if we did, we would use E(object)$attr syntax.

Visualize igraph objects

There are many aesthetic properties of network graph. But you can generally think of them as vertex (or node) aesthetics, edge aesthetics, and network aesthetics. To learn more about all of the ways to specify these in igraph, called ??igraph.plotting.

Let’s visualize our object gE. Here is a generic plot.

plot(gE)

By default, igraph labels each node with the $name attribute, and, if they exist, it will choose the sizes and colors using the $color and $size attributes. Since we set size and color in our att data frame, we replaced the defaults. But we can easily override them.

plot(gE, 
     vertex.color = '#2a00fa', 
     vertex.size = 30)

Let’s also change the color of the label to make it easier to read, change the edge colors to black, and make the arrow heads a bit smaller.

plot(gE, 
     vertex.color = '#2a00fa',
     vertex.size = 30,
     vertex.label.color = 'white', 
     edge.color = 'black',
     edge.arrow.size = 0.681)

Finally, let’s set the layout to layout.circle and add some curve to the edges.

plot(gE, 
     vertex.color = '#2a00fa',
     vertex.size = 30,
     vertex.label.color = 'white', 
     edge.color = 'black',
     edge.arrow.size = 0.681, 
     edge.curved = 0.1,
     layout = layout.circle)

Network objects in statnet

Create network objects in statnet is pretty intuitive. First, install and load the statnet suite of packages.

install.packages('statnet', dependencies=T)
library(statnet)

To create network objects, use the network function. When using an edgelist, tell the function you are using one in the matrix.type parameter.

netM <- network(M, vertex.attr = att)
netE <- network(E, vertex.attr = att, 
                matrix.type = "edgelist" )
netE
##  Network attributes:
##   vertices = 3 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 3 
##     missing edges= 0 
##     non-missing edges= 3 
## 
##  Vertex attribute names: 
##     color name size vertex.names 
## 
## No edge attributes

Like the igraph object, when we call the network object, we get a summary of some important network details.

To extract variables from the network object, we use one of three operators:

  • %v% – extract vertex attributes
  • %e% – extract edge attributes
  • %n% – extract network attributes

For example, we can see the values for color by extracting the attribute from the network by calling the network object, the appropriate operator, and the name of the attribute.

netE %v% 'color'
## [1] "tomato"         "cornflowerblue" "darkorchid"

Visualize network objects

Rather than working with the same old triadic network as we did before, let’s load a network from the ergm.count library that is part if statnet.

data(zach)

This is a network known as Zachary’s karate club. Here is the description from the package:

Zachary (1977) reported observations of social relations in a university karate club, with membership that varied between 50 and 100, of whom 34 individuals: 32 ordinary club members and officers, the club president (“John A.”), and the part-time instructor (“Mr. Hi”); consistently interacted outside of the club. Over the course of the study, the club divided into two factions, and, ultimately, split into two clubs, one led by Hi and the other by John and the original club’s officers. The split was driven by a disagreement over whether Hi could unilaterally change the level of compensation for his services.

Call the zach object and see the description.

zach
##  Network attributes:
##   vertices = 34 
##   directed = FALSE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 78 
##     missing edges= 0 
##     non-missing edges= 78 
## 
##  Vertex attribute names: 
##     club faction faction.id role vertex.names 
## 
##  Edge attribute names: 
##     contexts

Generate a generic plot of zach using the gplot function specific to the sna package.

par(mar=c(0,0,0,0)) # make margins smaller
gplot(zach, usearrows = F)

There are appear to be some individuals who have many more connections than any of the others. Let’s add an attribute to each node that shows the count of their connections. This is called degree centrality, and we can calculate it using the function degree.

# set gmode to 'graph' for undirected graphs 
set.vertex.attribute(zach, 
                     attrname = 'degree', 
                     value = degree(zach, gmode = 'graph'))

Now we have a vertex attribute that we can use to rescale the size of each node.

References

Zachary, WW (1977). An Information Flow Model for Conflict and Fission in Small Groups. Journal of Anthropological Research, 33(4), 452-473.