Reading Data

Authors

Thomas W. Valente

George G. Vega Yon

Aníbal Olivera M.

Published

June 24, 2025

Modified

June 24, 2025

Basic Diffusion Network (as_diffnet)

To create diffnet objects we only need a network and times of adoption:

set.seed(9)

# Network
net <- rgraph_ws(500, 4, .2)

# Times of adoption
toa <- sample(c(NA, 2010:2014), 500, TRUE)

diffnet_static <- as_diffnet(net, toa)

# Warning in new_diffnet(graph, ...): -graph- is static and will be recycled (see
# ?new_diffnet).

summary(diffnet_static)

# Diffusion network summary statistics
#  Name     :  Diffusion Network 
#  Behavior : Unknown
# -----------------------------------------------------------------------------
#  Period   Adopters   Cum Adopt. (%)   Hazard Rate   Density   Moran's I (sd)  
# -------- ---------- ---------------- ------------- --------- ---------------- 
#     2010         94        94 (0.19)             -      0.01 -0.00 (0.00)     
#     2011         67       161 (0.32)          0.17      0.01 -0.00 (0.00)     
#     2012         71       232 (0.46)          0.21      0.01 -0.00 (0.00)     
#     2013         93       325 (0.65)          0.35      0.01 -0.00 (0.00)     
#     2014         82       407 (0.81)          0.47      0.01 -0.00 (0.00)     
# ----------------------------------------------------------------------------- 
#  Left censoring  : 0.19 (94) 
#  Right centoring : 0.19 (93) 
#  # of nodes      : 500
# 
#  Moran's I was computed on contemporaneous autocorrelation using 1/geodesic
#  values. Significane levels  *** <= .01, ** <= .05, * <= .1.

Dynamic survey (survey_to_diffnet)

The package can also read dynamic survey data, i.e., data that has a time variable and a time of adoption variable. The function survey_to_diffnet is used to convert survey data into a diffnet object.

data("fakesurveyDyn")
fakesurveyDyn

#    id  toa group net1 net2 net3 age gender
# 1   1 1991     1   NA   NA   NA  30      M
# 2   2 1990     1    3    1   NA  35      F
# 3   3 1991     1   NA    2   NA  31      F
# 4   4 1990     1    6    5   NA  30      M
# 5   5 1991     1    4    4    3  40      F
# 6   1 1991     2    3    4    8  29      F
# 7   2 1990     2    3   NA   NA  35      M
# 8   5 1990     2   10    1   NA  50      M
# 9  10 1990     2    5    1   NA  19      F
# 10  1 1991     1   NA   NA   NA  31      M
# 11  2 1990     1    3    1   NA  36      F
# 12  3 1991     1   NA    2   NA  32      F
# 13  4 1990     1    6    5   NA  31      M
# 14  5 1991     1    4    4    3  41      F
# 15  1 1991     2    3    4    8  30      F
# 16  2 1990     2    1   NA   NA  36      M
# 17  5 1990     2   10    1   NA  51      M
# 18 10 1990     2    5    1   NA  20      F
#                                                   note time
# 1                           First wave: No nominations 1990
# 2                            First wave: Nothing weird 1990
# 3                   First wave: Only nominates in net2 1990
# 4   First wave: Nominates someone who wasn't interview 1990
# 5                    First wave: Nominates 4 two times 1990
# 6                 First wave: Only nominates outsiders 1990
# 7                                 First wave: Isolated 1990
# 8                            First wave: Nothing weird 1990
# 9                              First wave: Non-adopter 1990
# 10                         Second wave: No nominations 1991
# 11                          Second wave: Nothing weird 1991
# 12                 Second wave: Only nominates in net2 1991
# 13 Second wave: Nominates someone who wasn't interview 1991
# 14                  Second wave: Nominates 4 two times 1991
# 15               Second wave: Only nominates outsiders 1991
# 16                   Second wave: Now is not isolated! 1991
# 17                          Second wave: Nothing weird 1991
# 18                            Second wave: Non-adopter 1991

diffnet_dynsurvey <- survey_to_diffnet(
  dat      = fakesurveyDyn,
  idvar    = "id",
  netvars  = c("net1", "net2", "net3"),
  groupvar = "group",
  toavar   = "toa",
  timevar  = "time"
  )

plot_diffnet(diffnet_dynsurvey)

Datasets in netdiffuseR (surveys)

netdiffuseR has the three classic Diffusion Network Datasets (as surveys):
- medInnovations Doctors and the innovation of Tetracycline (1955).
- brfarmers Brazilian farmers and the innovation of Hybrid Corn Seed (1966).
- kfamily Korean women and Family Planning methods (1973).

Let’s have a look to kfamily:

data(kfamily)

# The data contains adoption information of 25 villages:
unique(kfamily$village)

#  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

# across 10 time steps (toa = 11 means 'no adoption')
sort(unique(kfamily$toa))

#  [1]  1  2  3  4  5  6  7  8  9 10 11

We can construct a diffnet object from from those survey data:

kfamily_diffnet <- survey_to_diffnet(
  dat      = kfamily,
  idvar    = "id",
  netvars  = c(
    # Neighbors talk to about FP
    "net11", "net12", "net13", "net14", "net15", 
    # Closest neighbor most frequently met
    "net21", "net22", "net23", "net24", "net25", 
    # Advice on FP sought from 
    "net31", "net32", "net33", "net34", "net35"), 
  toavar   = "toa",
  groupvar = "village"
)

kfamily_diffnet

# Dynamic network of class -diffnet-
#  Name               : Diffusion Network
#  Behavior           : Unknown
#  # of nodes         : 1047 (1002, 1003, 1005, 1007, 1010, 1011, 1012, 1014, ...)
#  # of time periods  : 11 (1 - 11)
#  Type               : directed
#  Num of behaviors   : 1
#  Final prevalence   : 1.00
#  Static attributes  : village, recno1, studno1, area1, id1, nmage1, nmag... (430)
#  Dynamic attributes : -

summary(kfamily_diffnet)

# Diffusion network summary statistics
#  Name     :  Diffusion Network 
#  Behavior : Unknown
# -----------------------------------------------------------------------------
#  Period   Adopters   Cum Adopt. (%)   Hazard Rate   Density   Moran's I (sd)  
# -------- ---------- ---------------- ------------- --------- ---------------- 
#        1         69        69 (0.07)             -      0.00  0.03 (0.01) *** 
#        2         94       163 (0.16)          0.10      0.00  0.02 (0.01) *** 
#        3         81       244 (0.23)          0.09      0.00  0.03 (0.01) *** 
#        4         86       330 (0.32)          0.11      0.00  0.06 (0.01) *** 
#        5         65       395 (0.38)          0.09      0.00  0.07 (0.01) *** 
#        6         62       457 (0.44)          0.10      0.00  0.06 (0.01) *** 
#        7         53       510 (0.49)          0.09      0.00  0.08 (0.01) *** 
#        8         53       563 (0.54)          0.10      0.00  0.08 (0.01) *** 
#        9         73       636 (0.61)          0.15      0.00  0.08 (0.01) *** 
#       10         37       673 (0.64)          0.09      0.00  0.07 (0.01) *** 
#       11        374      1047 (1.00)          1.00      0.00               -  
# ----------------------------------------------------------------------------- 
#  Left censoring  : 0.07 (69) 
#  Right centoring : 0.00 (0) 
#  # of nodes      : 1047
# 
#  Moran's I was computed on contemporaneous autocorrelation using 1/geodesic
#  values. Significane levels  *** <= .01, ** <= .05, * <= .1.

We can calculate direct exposure (cohesion) to an innovation using exposure(),

# Computing exposure
coh <- exposure(kfamily_diffnet)

# See results
head(round(coh, 2))

#      1    2    3    4    5    6    7    8    9   10 11
# 1002 0 0.17 0.50 0.67 0.67 0.67 0.67 0.67 0.83 0.83  1
# 1003 0 0.17 0.33 0.67 0.67 0.67 0.83 0.83 1.00 1.00  1
# 1005 0 0.43 0.71 1.00 1.00 1.00 1.00 1.00 1.00 1.00  1
# 1007 0 0.25 0.25 0.50 0.50 0.50 1.00 1.00 1.00 1.00  1
# 1010 0 0.60 0.60 0.80 0.80 0.80 0.80 0.80 1.00 1.00  1
# 1011 0 0.33 0.33 1.00 1.00 1.00 1.00 1.00 1.00 1.00  1

and also indirect influence (structural equivalence):

# Computing structural equivalence
se <- exposure(kfamily_diffnet, 
              alt.graph="se",     # select 'structural equivalence'
              groupvar="village", # separately by community
              valued=TRUE         # to account for weights
              )

# See results
head(round(se, 2))

#         1    2    3    4    5    6    7    8    9   10 11
# 1002 0.02 0.07 0.10 0.12 0.15 0.46 0.48 0.49 0.56 0.56  1
# 1003 0.01 0.03 0.04 0.05 0.06 0.41 0.43 0.43 0.48 0.48  1
# 1005 0.05 0.18 0.24 0.31 0.39 0.63 0.69 0.71 0.79 0.79  1
# 1007 0.01 0.03 0.03 0.05 0.06 0.52 0.53 0.53 0.70 0.70  1
# 1010 0.00 0.03 0.04 0.06 0.07 0.76 0.77 0.77 0.83 0.83  1
# 1011 0.03 0.14 0.20 0.26 0.32 0.60 0.65 0.65 0.74 0.74  1

The diffnet object also contains attributes of the vertices, which can be retrieved using diffnet.attrs():

# Retrieving attributes as data.frame
kfamily_diffnet.df <- diffnet.attrs(kfamily_diffnet, as.df = TRUE)

# Subset to relevant variables
kfamily_relevant_vars <- kfamily_diffnet.df[, c("per", "toa", "village")]

# Select 10 random rows
kfamily_relevant_vars[sample(nrow(kfamily_relevant_vars), 10), ]

#       per toa village
# 11136  11  11       2
# 94      1   2      12
# 8204    8  11       6
# 9663   10  11       1
# 5412    6  11      13
# 2652    3  11      22
# 7093    7  11       4
# 4029    4  11       6
# 915     1   5       7
# 2723    3  11      23

Problems

Using the rda file read.rda, read in the edgelist net_edgelist and the adjacency matrix net_list as a diffnet objects. In both cases you should use the data.frame X which has the time of adoption variable. (solution script)

Appendix

Diffusion Network Object (diffnet)

Most of the package’s functions accept different types of graphs:
- Static: matrix, dgCMatrix (from the Matrix pkg),
- Dynamic: list + dgCMatrix, array, diffnet
netdiffuseR has its own class of objects: diffnet, from which you get the most.
From netdiffuseR’s perspective, network data comes in three classes:
1. Raw R network data: Datasets with edgelist, attributes, survey data, etc.
2. Already R data: already read into R using igraph, statnet, etc. (igraph_to_diffnet, network_to_diffnet, etc.)
3. Graph files: DL, UCINET, pajek, etc. (read_pajek, read_dl, read_ucinet, etc.)
In this presentation we will show focus on 1.

What is a (diffnet) object

A diffusion network, a.k.a. diffnet object, is a list that holds the following objects:

graph: A list with \(t\) dgCMatrix matrices of size \(n\times n\),
toa: An integer vector of length \(n\),
adopt: A matrix of size \(n\times t\),
cumadopt: A matrix of size \(n\times t\),
vertex.static.attrs: A data.frame of size \(n\times k\),
vertex.dyn.attrs: A list with \(t\) dataframes of size \(n\times k\),
graph.attrs: Currently ignored…, and
meta: A list with metadata about the object.

These are created using new_diffnet (or its wrappers).

Static survey (survey_to_diffnet)

netdiffuseR can also read survey (nomination) data:

data("fakesurvey")
fakesurvey

#   id toa group net1 net2 net3 age gender                                   note
# 1  1   1     1   NA   NA   NA  30      M                         No nominations
# 2  2   5     1    3    1   NA  35      F                          Nothing weird
# 3  3   5     1   NA    2   NA  31      F                 Only nominates in net2
# 4  4   3     1    6    5   NA  30      M Nominates someone who wasn't interview
# 5  5   2     1    4    4    3  40      F                  Nominates 4 two times
# 6  1   4     2    3    4    8  29      F               Only nominates outsiders
# 7  2   3     2    3   NA   NA  35      M                               Isolated
# 8  5   3     2   10    1   NA  50      M                          Nothing weird
# 9 10  NA     2    5    1   NA  19      F                            Non-adopter

In group one, id 4 nominates id 6, who does not show in the data, and in group two id 1 nominates 3, 4, and 8, also individuals who don’t show up in the survey.

d1 <- survey_to_diffnet(
  dat      = fakesurvey,                # Dataset
  idvar    = "id",                      # The name of the idvar
  netvars  = c("net1", "net2", "net3"), # Name of the nomination variables
  groupvar = "group",                   # Group variable (if any)
  toavar   = "toa"                      # Name of the time of adoption variable
  ); d1

# Dynamic network of class -diffnet-
#  Name               : Diffusion Network
#  Behavior           : Unknown
#  # of nodes         : 9 (101, 102, 103, 104, 105, 201, 202, 205, ...)
#  # of time periods  : 5 (1 - 5)
#  Type               : directed
#  Num of behaviors   : 1
#  Final prevalence   : 0.89
#  Static attributes  : group, net1, net2, net3, age, gender, note (7)
#  Dynamic attributes : -

If you want to include those, you can use the option no.unsurveyed

d2 <- survey_to_diffnet(
  dat      = fakesurvey,
  idvar    = "id",
  netvars  = c("net1", "net2", "net3"),
  groupvar = "group",
  toavar   = "toa",
  no.unsurveyed = FALSE
  ); d2

# Dynamic network of class -diffnet-
#  Name               : Diffusion Network
#  Behavior           : Unknown
#  # of nodes         : 13 (101, 102, 103, 104, 105, 106, 201, 202, ...)
#  # of time periods  : 5 (1 - 5)
#  Type               : directed
#  Num of behaviors   : 1
#  Final prevalence   : 0.62
#  Static attributes  : group, net1, net2, net3, age, gender, note (7)
#  Dynamic attributes : -

We can also check the difference

d2 - d1

# Dynamic network of class -diffnet-
#  Name               : Diffusion Network
#  Behavior           : Unknown
#  # of nodes         : 4 (106, 203, 204, 208)
#  # of time periods  : 5 (1 - 5)
#  Type               : directed
#  Num of behaviors   : 1
#  Final prevalence   : 0.00
#  Static attributes  : group, net1, net2, net3, age, gender, note (7)
#  Dynamic attributes : -

rownames(d2 - d1)

# [1] "106" "203" "204" "208"

Other network formats

The package also supports working with other network formats.

Besides of .net (Pajek), and ml (UCINET), netdiffuseR can actually convert between classes: igraph, network, and networkDynamic.

data("medInnovationsDiffNet")
dn_ig  <- diffnet_to_igraph(medInnovationsDiffNet)
# dn_ig # For some issue with lazy eval, knitr won't print this

dn_net <- diffnet_to_network(medInnovationsDiffNet)
dn_net[[1]]

#  Network attributes:
#   vertices = 125 
#   directed = TRUE 
#   hyper = FALSE 
#   loops = FALSE 
#   multiple = FALSE 
#   bipartite = FALSE 
#   name = Medical Innovation 
#   behavior = Adoption of Tetracycline 
#   total edges= 294 
#     missing edges= 0 
#     non-missing edges= 294 
# 
#  Vertex attribute names: 
#     ado adopt attend belief catbak city club coll commun ctl date detail detail2 dichot drug expect free friends here home house info journ journ2 length meet most net1_1 net1_2 net1_3 net2_1 net2_2 net2_3 net3_1 net3_2 net3_3 nojourn nonpoor office origid paadico perc position presc proage proage2 proximty recall recon reltend science social sourinfo special study tend thresh toa vertex.names young 
# 
# No edge attributes

dn_ndy <- diffnet_to_networkDynamic(medInnovationsDiffNet)

# Argument base.net not specified, using first element of network.list instead
# Created net.obs.period to describe network
#  Network observation period info:
#   Number of observation spells: 1 
#   Maximal time range observed: 1 until 18 
#   Temporal mode: discrete 
#   Time unit: step 
#   Suggested time increment: 1

dn_ndy

# NetworkDynamic properties:
#   distinct change times: 18 
#   maximal time range: 1 until  18 
# 
# Includes optional net.obs.period attribute:
#  Network observation period info:
#   Number of observation spells: 1 
#   Maximal time range observed: 1 until 18 
#   Temporal mode: discrete 
#   Time unit: step 
#   Suggested time increment: 1 
# 
#  Network attributes:
#   vertices = 125 
#   directed = TRUE 
#   hyper = FALSE 
#   loops = FALSE 
#   multiple = FALSE 
#   bipartite = FALSE 
#   behavior = Adoption of Tetracycline 
#   name = Medical Innovation 
#   net.obs.period: (not shown)
#   total edges= 294 
#     missing edges= 0 
#     non-missing edges= 294 
# 
#  Vertex attribute names: 
#     active ado adopt attend belief catbak city club coll commun ctl date detail detail2 dichot drug expect free friends here home house info journ journ2 length meet most net1_1 net1_2 net1_3 net2_1 net2_2 net2_3 net3_1 net3_2 net3_3 nojourn nonpoor office origid paadico perc position presc proage proage2 proximty recall recon reltend science social sourinfo special study tend thresh toa vertex.names young 
# 
#  Edge attribute names: 
#     active

First two examples it creates a list of objects, the later actually creates a single object

networkDynamic_to_diffnet(dn_ndy, toavar = "toa")

# Dynamic network of class -diffnet-
#  Name               : Medical Innovation
#  Behavior           : Adoption of Tetracycline
#  # of nodes         : 125 (1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, ...)
#  # of time periods  : 18 (1 - 18)
#  Type               : directed
#  Num of behaviors   : 1
#  Final prevalence   : 1.00
#  Static attributes  : -
#  Dynamic attributes : ado, adopt, attend, belief, catbak, city, club, co... (59)