Simulation of diffusion networks: rdiffnet

Author

Thomas W. Valente and George G. Vega Yon

Introduction

Before we start, a review of the concepts we will be using here

Exposure: Proportion/number of neighbors that has adopted an innovation at each point in time.
Threshold: The proportion/number of your neighbors who had adopted at or one time period before ego (the focal individual) adopted.
Infectiousness: How much \(i\)’s adoption affects her alters.
Susceptibility: How much \(i\)’s alters’ adoption affects her.
Structural equivalence: How similar are \(i\) and \(j\) in terms of position in the network.

Simulating diffusion networks

We will simulate a diffusion network with the following parameters:

Will have 1,000 vertices,
Will span 20 time periods,
The initial adopters (seeds) will be selected random,
Seeds will be a 10% of the network,
The graph (network) will be small-world,
Will use the WS algorithmwith \(p=.2\) (probability of rewire).
Threshold levels will be uniformly distributed between [0.3, 0.7]

To generate this diffusion network, we can use the rdiffnet function included in the package:

# Setting the seed for the RNG
set.seed(1213)

# Generating a random diffusion network
net <- rdiffnet(
  n              = 1e3,                         # 1.
  t              = 20,                          # 2.
  seed.nodes     = "random",                    # 3.
  seed.p.adopt   = .1,                          # 4.
  seed.graph     = "small-world",               # 5.
  rgraph.args    = list(p=.2),                  # 6.
  threshold.dist = function(x) runif(1, .3, .7) # 7.
  )

# The option -copy.first- is set to TRUE. In this case, the first graph will be treated as a baseline, and thus, networks after T=1 will be replaced with T-1.TRUE

The function rdiffnet generates random diffusion networks. Main features:
1. Simulating random graph or using your own,
2. Setting threshold levels per node,
3. Network rewiring throughout the simulation, and
4. Setting the seed nodes.
The simulation algorithm is as follows:
1. If required, a baseline graph is created,
2. Set of initial adopters and threshold distribution are established,
3. The set of t networks is created (if required), and
4. Simulation starts at t=2, assigning adopters based on exposures and thresholds:
  1. For each \(i \in N\), if its exposure at \(t-1\) is greater than its threshold, then adopts, otherwise continue without change.
  2. next \(i\)

Rumor spreading

library(netdiffuseR)

set.seed(09)
diffnet_rumor <- rdiffnet(
  n = 5e2,
  t = 5, 
  seed.graph = "small-world",
  rgraph.args = list(k = 4, p = .3),
  seed.nodes = "random",
  seed.p.adopt = .05,
  rewire = TRUE,
  threshold.dist = function(i) 1L,
  exposure.args = list(normalized = FALSE)
  )

# The option -copy.first- is set to TRUE. In this case, the first graph will be treated as a baseline, and thus, networks after T=1 will be replaced with T-1.TRUE

summary(diffnet_rumor)

# Diffusion network summary statistics
#  Name     :  A diffusion network 
#  Behavior : Random contagion
# -----------------------------------------------------------------------------
#  Period   Adopters   Cum Adopt. (%)   Hazard Rate   Density   Moran's I (sd)  
# -------- ---------- ---------------- ------------- --------- ---------------- 
#        1         25        25 (0.05)             -      0.01 -0.00 (0.00)     
#        2         78       103 (0.21)          0.16      0.01  0.01 (0.00) *** 
#        3        187       290 (0.58)          0.47      0.01  0.01 (0.00) *** 
#        4        183       473 (0.95)          0.87      0.01  0.01 (0.00) *** 
#        5         27       500 (1.00)          1.00      0.01               -  
# ----------------------------------------------------------------------------- 
#  Left censoring  : 0.05 (25) 
#  Right centoring : 0.00 (0) 
#  # of nodes      : 500
# 
#  Moran's I was computed on contemporaneous autocorrelation using 1/geodesic
#  values. Significane levels  *** <= .01, ** <= .05, * <= .1.

plot_diffnet(diffnet_rumor, slices = c(1, 3, 5))

# We want to use igraph to compute layout
igdf <- diffnet_to_igraph(diffnet_rumor, slices=c(1,2))[[1]]
pos <- igraph::layout_with_drl(igdf)

plot_diffnet2(diffnet_rumor, vertex.size = dgr(diffnet_rumor)[,1], layout=pos)

Difussion

set.seed(09)
diffnet_disease <- rdiffnet(
  seed.graph = diffnet_rumor$graph,
  seed.nodes = which(diffnet_rumor$toa == 1),
  rewire = FALSE,
  threshold.dist = function(i) rbeta(1, 3, 10),
  name = "Diffusion",
  behavior = "Some social behavior"
)

plot_adopters(diffnet_rumor, what = "cumadopt", include.legend = FALSE)
plot_adopters(diffnet_disease, bg="lightblue", add=TRUE, what = "cumadopt")
legend(
  "topleft",
  legend = c("Disease", "Rumor"),
  col = c("lightblue", "tomato"),
  bty = "n", pch=19
  )

Multi-diffusion models (TBD)

Mentor Matching

# Finding mentors
mentors <- mentor_matching(diffnet_rumor, 25, lead.ties.method = "random")

# Simulating diffusion with these mentors
set.seed(09)
diffnet_mentored <- rdiffnet(
  seed.graph = diffnet_disease,
  seed.nodes = which(mentors$`1`$isleader),
  rewire = FALSE,
  threshold.dist = diffnet_disease[["real_threshold"]],
  name = "Diffusion using Mentors"
)

summary(diffnet_mentored)

# Diffusion network summary statistics
#  Name     :  Diffusion using Mentors 
#  Behavior : Random contagion
# -----------------------------------------------------------------------------
#  Period   Adopters   Cum Adopt. (%)   Hazard Rate   Density   Moran's I (sd)  
# -------- ---------- ---------------- ------------- --------- ---------------- 
#        1         25        25 (0.05)             -      0.01 -0.00 (0.00)     
#        2         93       118 (0.24)          0.20      0.01  0.01 (0.00) *** 
#        3        155       273 (0.55)          0.41      0.01  0.01 (0.00) *** 
#        4        162       435 (0.87)          0.71      0.01  0.01 (0.00) *** 
#        5         62       497 (0.99)          0.95      0.01 -0.00 (0.00)     
# ----------------------------------------------------------------------------- 
#  Left censoring  : 0.05 (25) 
#  Right centoring : 0.01 (3) 
#  # of nodes      : 500
# 
#  Moran's I was computed on contemporaneous autocorrelation using 1/geodesic
#  values. Significane levels  *** <= .01, ** <= .05, * <= .1.

cumulative_adopt_count(diffnet_disease)

#          1      2          3           4           5
# num  25.00 66.000 148.000000 286.0000000 434.0000000
# prop  0.05  0.132   0.296000   0.5720000   0.8680000
# rate  0.00  1.640   1.242424   0.9324324   0.5174825

cumulative_adopt_count(diffnet_mentored)

#          1       2          3           4           5
# num  25.00 118.000 273.000000 435.0000000 497.0000000
# prop  0.05   0.236   0.546000   0.8700000   0.9940000
# rate  0.00   3.720   1.313559   0.5934066   0.1425287

Example by changing threshold

The following block of code runs multiple diffnet simulations. Before we proceed, we will generate a scale-free homophilic network:

# Simulating a scale-free homophilic network
set.seed(1231)
X <- rep(c(1,1,1,1,1,0,0,0,0,0), 50)
net <- rgraph_ba(t = 499, m=4, eta = X)

# Taking a look in igraph
ig  <- igraph::graph_from_adjacency_matrix(net)
plot(ig, vertex.color = c("azure", "tomato")[X+1], vertex.label = NA,
     vertex.size = sqrt(dgr(net)))

Besides of the usual parameters passed to rdiffnet, the rdiffnet_multiple function requires R (number of repetitions/simulations), and statistic (a function that returns the statistic of insterst). Optionally, the user may choose to specify the number of clusters to run it in parallel (multiple CPUs):

nsim <- 500L

ans_1and2 <- rdiffnet_multiple(
  # Num of sim
  R              = nsim,
  # Statistic
  statistic      = function(d) cumulative_adopt_count(d)["prop",], 
  seed.graph     = net,
  t              = 10,
  threshold.dist = sample(1:2, 500L, TRUE),
  seed.nodes     = "random",
  seed.p.adopt   = .1,
  rewire         = FALSE,
  exposure.args  = list(outgoing=FALSE, normalized=FALSE),
  # Running on 4 cores
  ncpus          = 4L
  ) |> t()

ans_2and3 <- rdiffnet_multiple(
  # Num of sim
  R              = nsim,
  # Statistic
  statistic      = function(d) cumulative_adopt_count(d)["prop",], 
  seed.graph     = net,
  t              = 10,
  threshold.dist = sample(2:3, 500, TRUE),
  seed.nodes     = "random",
  seed.p.adopt   = .1,
  rewire         = FALSE,
  exposure.args  = list(outgoing=FALSE, normalized=FALSE),
  # Running on 4 cores
  ncpus          = 4L
  ) |> t()

ans_1and3 <- rdiffnet_multiple(
  # Num of sim
  R              = nsim,
  # Statistic
  statistic      = function(d) cumulative_adopt_count(d)["prop",], 
  seed.graph     = net,
  t              = 10,
  threshold.dist = sample(1:3, 500, TRUE),
  seed.nodes     = "random",
  seed.p.adopt   = .1,
  rewire         = FALSE,
  exposure.args  = list(outgoing=FALSE, normalized=FALSE),
  # Running on 4 cores
  ncpus          = 4L
  ) |> t()

boxplot(ans_1and2, col="ivory", xlab = "Time", ylab = "Proportion of Adopters")
boxplot(ans_2and3, col="tomato", add=TRUE)
boxplot(ans_1and3, col = "steelblue", add=TRUE)
legend(
  "topleft",
  fill = c("ivory", "tomato", "steelblue"),
  legend = c("1/2", "2/3", "1/3"),
  title = "Threshold range",
  bty ="n"
)

Example simulating a thousand networks by changing threshold levels. The final prevalence, or hazard as a function of threshold levels.

Problems

Given the following types of networks: Small-world, Scale-free, Bernoulli, what set of \(n\) initiators maximizes diffusion? (solution script and solution plot)

Appendix

The following is example code that can be used to run multiple simulations like it is done using the rdiffnet_multiple function. We do not recommend this approach but it may be useful for some users:

# Now, simulating a bunch of diffusion processes
nsim <- 500L
ans_1and2 <- vector("list", nsim)
set.seed(223)
for (i in 1:nsim) {
  # We just want the cum adopt count
  ans_1and2[[i]] <- 
    cumulative_adopt_count(
      rdiffnet(
        seed.graph = net,
        t = 10,
        threshold.dist = sample(1:2, 500L, TRUE),
        seed.nodes = "random",
        seed.p.adopt = .10,
        exposure.args = list(outgoing = FALSE, normalized = FALSE),
        rewire = FALSE
        )
      )
  
  # Are we there yet?
  if (!(i %% 50))
    message("Simulation ", i," of ", nsim, " done.")
}
# Simulation 50 of 500 done.
# Simulation 100 of 500 done.
# Simulation 150 of 500 done.
# Simulation 200 of 500 done.
# Simulation 250 of 500 done.
# Simulation 300 of 500 done.
# Simulation 350 of 500 done.
# Simulation 400 of 500 done.
# Simulation 450 of 500 done.
# Simulation 500 of 500 done.

# Extracting prop
ans_1and2 <- do.call(rbind, lapply(ans_1and2, "[", i="prop", j=))

ans_2and3 <- vector("list", nsim)
set.seed(223)
for (i in 1:nsim) {
  # We just want the cum adopt count
  ans_2and3[[i]] <- 
    cumulative_adopt_count(
      rdiffnet(
        seed.graph = net,
        t = 10,
        threshold.dist = sample(2:3, 500L, TRUE),
        seed.nodes = "random",
        seed.p.adopt = .10,
        exposure.args = list(outgoing = FALSE, normalized = FALSE),
        rewire = FALSE
        )
      )
  
  # Are we there yet?
  if (!(i %% 50))
    message("Simulation ", i," of ", nsim, " done.")
}
# Simulation 50 of 500 done.
# Simulation 100 of 500 done.
# Simulation 150 of 500 done.
# Simulation 200 of 500 done.
# Simulation 250 of 500 done.
# Simulation 300 of 500 done.
# Simulation 350 of 500 done.
# Simulation 400 of 500 done.
# Simulation 450 of 500 done.
# Simulation 500 of 500 done.

ans_2and3 <- do.call(rbind, lapply(ans_2and3, "[", i="prop", j=))