Reading Data

Authors

Thomas W. Valente

George G. Vega Yon

Aníbal Olivera M.

Published

June 24, 2025

Modified

June 24, 2025

Basic Diffusion Network (as_diffnet)

  • To create diffnet objects we only need a network and times of adoption:

    set.seed(9)
    
    # Network
    net <- rgraph_ws(500, 4, .2)
    
    # Times of adoption
    toa <- sample(c(NA, 2010:2014), 500, TRUE)
    
    diffnet_static <- as_diffnet(net, toa)
    # Warning in new_diffnet(graph, ...): -graph- is static and will be recycled (see
    # ?new_diffnet).
    summary(diffnet_static)
    # Diffusion network summary statistics
    #  Name     :  Diffusion Network 
    #  Behavior : Unknown
    # -----------------------------------------------------------------------------
    #  Period   Adopters   Cum Adopt. (%)   Hazard Rate   Density   Moran's I (sd)  
    # -------- ---------- ---------------- ------------- --------- ---------------- 
    #     2010         94        94 (0.19)             -      0.01 -0.00 (0.00)     
    #     2011         67       161 (0.32)          0.17      0.01 -0.00 (0.00)     
    #     2012         71       232 (0.46)          0.21      0.01 -0.00 (0.00)     
    #     2013         93       325 (0.65)          0.35      0.01 -0.00 (0.00)     
    #     2014         82       407 (0.81)          0.47      0.01 -0.00 (0.00)     
    # ----------------------------------------------------------------------------- 
    #  Left censoring  : 0.19 (94) 
    #  Right centoring : 0.19 (93) 
    #  # of nodes      : 500
    # 
    #  Moran's I was computed on contemporaneous autocorrelation using 1/geodesic
    #  values. Significane levels  *** <= .01, ** <= .05, * <= .1.

Dynamic survey (survey_to_diffnet)

The package can also read dynamic survey data, i.e., data that has a time variable and a time of adoption variable. The function survey_to_diffnet is used to convert survey data into a diffnet object.

data("fakesurveyDyn")
fakesurveyDyn
#    id  toa group net1 net2 net3 age gender
# 1   1 1991     1   NA   NA   NA  30      M
# 2   2 1990     1    3    1   NA  35      F
# 3   3 1991     1   NA    2   NA  31      F
# 4   4 1990     1    6    5   NA  30      M
# 5   5 1991     1    4    4    3  40      F
# 6   1 1991     2    3    4    8  29      F
# 7   2 1990     2    3   NA   NA  35      M
# 8   5 1990     2   10    1   NA  50      M
# 9  10 1990     2    5    1   NA  19      F
# 10  1 1991     1   NA   NA   NA  31      M
# 11  2 1990     1    3    1   NA  36      F
# 12  3 1991     1   NA    2   NA  32      F
# 13  4 1990     1    6    5   NA  31      M
# 14  5 1991     1    4    4    3  41      F
# 15  1 1991     2    3    4    8  30      F
# 16  2 1990     2    1   NA   NA  36      M
# 17  5 1990     2   10    1   NA  51      M
# 18 10 1990     2    5    1   NA  20      F
#                                                   note time
# 1                           First wave: No nominations 1990
# 2                            First wave: Nothing weird 1990
# 3                   First wave: Only nominates in net2 1990
# 4   First wave: Nominates someone who wasn't interview 1990
# 5                    First wave: Nominates 4 two times 1990
# 6                 First wave: Only nominates outsiders 1990
# 7                                 First wave: Isolated 1990
# 8                            First wave: Nothing weird 1990
# 9                              First wave: Non-adopter 1990
# 10                         Second wave: No nominations 1991
# 11                          Second wave: Nothing weird 1991
# 12                 Second wave: Only nominates in net2 1991
# 13 Second wave: Nominates someone who wasn't interview 1991
# 14                  Second wave: Nominates 4 two times 1991
# 15               Second wave: Only nominates outsiders 1991
# 16                   Second wave: Now is not isolated! 1991
# 17                          Second wave: Nothing weird 1991
# 18                            Second wave: Non-adopter 1991
diffnet_dynsurvey <- survey_to_diffnet(
  dat      = fakesurveyDyn,
  idvar    = "id",
  netvars  = c("net1", "net2", "net3"),
  groupvar = "group",
  toavar   = "toa",
  timevar  = "time"
  )

plot_diffnet(diffnet_dynsurvey)

Datasets in netdiffuseR (surveys)

  • netdiffuseR has the three classic Diffusion Network Datasets (as surveys):

    • medInnovations Doctors and the innovation of Tetracycline (1955).
    • brfarmers Brazilian farmers and the innovation of Hybrid Corn Seed (1966).
    • kfamily Korean women and Family Planning methods (1973).

Let’s have a look to kfamily:

data(kfamily)

# The data contains adoption information of 25 villages:
unique(kfamily$village)
#  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
# across 10 time steps (toa = 11 means 'no adoption')
sort(unique(kfamily$toa))
#  [1]  1  2  3  4  5  6  7  8  9 10 11

We can construct a diffnet object from from those survey data:

kfamily_diffnet <- survey_to_diffnet(
  dat      = kfamily,
  idvar    = "id",
  netvars  = c(
    # Neighbors talk to about FP
    "net11", "net12", "net13", "net14", "net15", 
    # Closest neighbor most frequently met
    "net21", "net22", "net23", "net24", "net25", 
    # Advice on FP sought from 
    "net31", "net32", "net33", "net34", "net35"), 
  toavar   = "toa",
  groupvar = "village"
)

kfamily_diffnet
# Dynamic network of class -diffnet-
#  Name               : Diffusion Network
#  Behavior           : Unknown
#  # of nodes         : 1047 (1002, 1003, 1005, 1007, 1010, 1011, 1012, 1014, ...)
#  # of time periods  : 11 (1 - 11)
#  Type               : directed
#  Num of behaviors   : 1
#  Final prevalence   : 1.00
#  Static attributes  : village, recno1, studno1, area1, id1, nmage1, nmag... (430)
#  Dynamic attributes : -
summary(kfamily_diffnet)
# Diffusion network summary statistics
#  Name     :  Diffusion Network 
#  Behavior : Unknown
# -----------------------------------------------------------------------------
#  Period   Adopters   Cum Adopt. (%)   Hazard Rate   Density   Moran's I (sd)  
# -------- ---------- ---------------- ------------- --------- ---------------- 
#        1         69        69 (0.07)             -      0.00  0.03 (0.01) *** 
#        2         94       163 (0.16)          0.10      0.00  0.02 (0.01) *** 
#        3         81       244 (0.23)          0.09      0.00  0.03 (0.01) *** 
#        4         86       330 (0.32)          0.11      0.00  0.06 (0.01) *** 
#        5         65       395 (0.38)          0.09      0.00  0.07 (0.01) *** 
#        6         62       457 (0.44)          0.10      0.00  0.06 (0.01) *** 
#        7         53       510 (0.49)          0.09      0.00  0.08 (0.01) *** 
#        8         53       563 (0.54)          0.10      0.00  0.08 (0.01) *** 
#        9         73       636 (0.61)          0.15      0.00  0.08 (0.01) *** 
#       10         37       673 (0.64)          0.09      0.00  0.07 (0.01) *** 
#       11        374      1047 (1.00)          1.00      0.00               -  
# ----------------------------------------------------------------------------- 
#  Left censoring  : 0.07 (69) 
#  Right centoring : 0.00 (0) 
#  # of nodes      : 1047
# 
#  Moran's I was computed on contemporaneous autocorrelation using 1/geodesic
#  values. Significane levels  *** <= .01, ** <= .05, * <= .1.

We can calculate direct exposure (cohesion) to an innovation using exposure(),

# Computing exposure
coh <- exposure(kfamily_diffnet)

# See results
head(round(coh, 2))
#      1    2    3    4    5    6    7    8    9   10 11
# 1002 0 0.17 0.50 0.67 0.67 0.67 0.67 0.67 0.83 0.83  1
# 1003 0 0.17 0.33 0.67 0.67 0.67 0.83 0.83 1.00 1.00  1
# 1005 0 0.43 0.71 1.00 1.00 1.00 1.00 1.00 1.00 1.00  1
# 1007 0 0.25 0.25 0.50 0.50 0.50 1.00 1.00 1.00 1.00  1
# 1010 0 0.60 0.60 0.80 0.80 0.80 0.80 0.80 1.00 1.00  1
# 1011 0 0.33 0.33 1.00 1.00 1.00 1.00 1.00 1.00 1.00  1

and also indirect influence (structural equivalence):

# Computing structural equivalence
se <- exposure(kfamily_diffnet, 
              alt.graph="se",     # select 'structural equivalence'
              groupvar="village", # separately by community
              valued=TRUE         # to account for weights
              )

# See results
head(round(se, 2))
#         1    2    3    4    5    6    7    8    9   10 11
# 1002 0.02 0.07 0.10 0.12 0.15 0.46 0.48 0.49 0.56 0.56  1
# 1003 0.01 0.03 0.04 0.05 0.06 0.41 0.43 0.43 0.48 0.48  1
# 1005 0.05 0.18 0.24 0.31 0.39 0.63 0.69 0.71 0.79 0.79  1
# 1007 0.01 0.03 0.03 0.05 0.06 0.52 0.53 0.53 0.70 0.70  1
# 1010 0.00 0.03 0.04 0.06 0.07 0.76 0.77 0.77 0.83 0.83  1
# 1011 0.03 0.14 0.20 0.26 0.32 0.60 0.65 0.65 0.74 0.74  1

The diffnet object also contains attributes of the vertices, which can be retrieved using diffnet.attrs():

# Retrieving attributes as data.frame
kfamily_diffnet.df <- diffnet.attrs(kfamily_diffnet, as.df = TRUE)

# Subset to relevant variables
kfamily_relevant_vars <- kfamily_diffnet.df[, c("per", "toa", "village")]

# Select 10 random rows
kfamily_relevant_vars[sample(nrow(kfamily_relevant_vars), 10), ]
#       per toa village
# 11136  11  11       2
# 94      1   2      12
# 8204    8  11       6
# 9663   10  11       1
# 5412    6  11      13
# 2652    3  11      22
# 7093    7  11       4
# 4029    4  11       6
# 915     1   5       7
# 2723    3  11      23

Problems

  1. Using the rda file read.rda, read in the edgelist net_edgelist and the adjacency matrix net_list as a diffnet objects. In both cases you should use the data.frame X which has the time of adoption variable. (solution script)

Appendix

Diffusion Network Object (diffnet)

  • Most of the package’s functions accept different types of graphs:

    • Static: matrix, dgCMatrix (from the Matrix pkg),
    • Dynamic: list + dgCMatrix, array, diffnet
  • netdiffuseR has its own class of objects: diffnet, from which you get the most.

  • From netdiffuseR’s perspective, network data comes in three classes:

    1. Raw R network data: Datasets with edgelist, attributes, survey data, etc.
    2. Already R data: already read into R using igraph, statnet, etc. (igraph_to_diffnet, network_to_diffnet, etc.)
    3. Graph files: DL, UCINET, pajek, etc. (read_pajek, read_dl, read_ucinet, etc.)
  • In this presentation we will show focus on 1.

What is a (diffnet) object

A diffusion network, a.k.a. diffnet object, is a list that holds the following objects:

  • graph: A list with \(t\) dgCMatrix matrices of size \(n\times n\),
  • toa: An integer vector of length \(n\),
  • adopt: A matrix of size \(n\times t\),
  • cumadopt: A matrix of size \(n\times t\),
  • vertex.static.attrs: A data.frame of size \(n\times k\),
  • vertex.dyn.attrs: A list with \(t\) dataframes of size \(n\times k\),
  • graph.attrs: Currently ignored…, and
  • meta: A list with metadata about the object.

These are created using new_diffnet (or its wrappers).

Static survey (survey_to_diffnet)

  • netdiffuseR can also read survey (nomination) data:

    data("fakesurvey")
    fakesurvey
    #   id toa group net1 net2 net3 age gender                                   note
    # 1  1   1     1   NA   NA   NA  30      M                         No nominations
    # 2  2   5     1    3    1   NA  35      F                          Nothing weird
    # 3  3   5     1   NA    2   NA  31      F                 Only nominates in net2
    # 4  4   3     1    6    5   NA  30      M Nominates someone who wasn't interview
    # 5  5   2     1    4    4    3  40      F                  Nominates 4 two times
    # 6  1   4     2    3    4    8  29      F               Only nominates outsiders
    # 7  2   3     2    3   NA   NA  35      M                               Isolated
    # 8  5   3     2   10    1   NA  50      M                          Nothing weird
    # 9 10  NA     2    5    1   NA  19      F                            Non-adopter
  • In group one, id 4 nominates id 6, who does not show in the data, and in group two id 1 nominates 3, 4, and 8, also individuals who don’t show up in the survey.

    d1 <- survey_to_diffnet(
      dat      = fakesurvey,                # Dataset
      idvar    = "id",                      # The name of the idvar
      netvars  = c("net1", "net2", "net3"), # Name of the nomination variables
      groupvar = "group",                   # Group variable (if any)
      toavar   = "toa"                      # Name of the time of adoption variable
      ); d1
    # Dynamic network of class -diffnet-
    #  Name               : Diffusion Network
    #  Behavior           : Unknown
    #  # of nodes         : 9 (101, 102, 103, 104, 105, 201, 202, 205, ...)
    #  # of time periods  : 5 (1 - 5)
    #  Type               : directed
    #  Num of behaviors   : 1
    #  Final prevalence   : 0.89
    #  Static attributes  : group, net1, net2, net3, age, gender, note (7)
    #  Dynamic attributes : -
  • If you want to include those, you can use the option no.unsurveyed

    d2 <- survey_to_diffnet(
      dat      = fakesurvey,
      idvar    = "id",
      netvars  = c("net1", "net2", "net3"),
      groupvar = "group",
      toavar   = "toa",
      no.unsurveyed = FALSE
      ); d2
    # Dynamic network of class -diffnet-
    #  Name               : Diffusion Network
    #  Behavior           : Unknown
    #  # of nodes         : 13 (101, 102, 103, 104, 105, 106, 201, 202, ...)
    #  # of time periods  : 5 (1 - 5)
    #  Type               : directed
    #  Num of behaviors   : 1
    #  Final prevalence   : 0.62
    #  Static attributes  : group, net1, net2, net3, age, gender, note (7)
    #  Dynamic attributes : -
  • We can also check the difference

    d2 - d1
    # Dynamic network of class -diffnet-
    #  Name               : Diffusion Network
    #  Behavior           : Unknown
    #  # of nodes         : 4 (106, 203, 204, 208)
    #  # of time periods  : 5 (1 - 5)
    #  Type               : directed
    #  Num of behaviors   : 1
    #  Final prevalence   : 0.00
    #  Static attributes  : group, net1, net2, net3, age, gender, note (7)
    #  Dynamic attributes : -
    rownames(d2 - d1)
    # [1] "106" "203" "204" "208"

Other network formats

  • The package also supports working with other network formats.

  • Besides of .net (Pajek), and ml (UCINET), netdiffuseR can actually convert between classes: igraph, network, and networkDynamic.

    data("medInnovationsDiffNet")
    dn_ig  <- diffnet_to_igraph(medInnovationsDiffNet)
    # dn_ig # For some issue with lazy eval, knitr won't print this
    
    dn_net <- diffnet_to_network(medInnovationsDiffNet)
    dn_net[[1]]
    #  Network attributes:
    #   vertices = 125 
    #   directed = TRUE 
    #   hyper = FALSE 
    #   loops = FALSE 
    #   multiple = FALSE 
    #   bipartite = FALSE 
    #   name = Medical Innovation 
    #   behavior = Adoption of Tetracycline 
    #   total edges= 294 
    #     missing edges= 0 
    #     non-missing edges= 294 
    # 
    #  Vertex attribute names: 
    #     ado adopt attend belief catbak city club coll commun ctl date detail detail2 dichot drug expect free friends here home house info journ journ2 length meet most net1_1 net1_2 net1_3 net2_1 net2_2 net2_3 net3_1 net3_2 net3_3 nojourn nonpoor office origid paadico perc position presc proage proage2 proximty recall recon reltend science social sourinfo special study tend thresh toa vertex.names young 
    # 
    # No edge attributes
    dn_ndy <- diffnet_to_networkDynamic(medInnovationsDiffNet)
    # Argument base.net not specified, using first element of network.list instead
    # Created net.obs.period to describe network
    #  Network observation period info:
    #   Number of observation spells: 1 
    #   Maximal time range observed: 1 until 18 
    #   Temporal mode: discrete 
    #   Time unit: step 
    #   Suggested time increment: 1
    dn_ndy
    # NetworkDynamic properties:
    #   distinct change times: 18 
    #   maximal time range: 1 until  18 
    # 
    # Includes optional net.obs.period attribute:
    #  Network observation period info:
    #   Number of observation spells: 1 
    #   Maximal time range observed: 1 until 18 
    #   Temporal mode: discrete 
    #   Time unit: step 
    #   Suggested time increment: 1 
    # 
    #  Network attributes:
    #   vertices = 125 
    #   directed = TRUE 
    #   hyper = FALSE 
    #   loops = FALSE 
    #   multiple = FALSE 
    #   bipartite = FALSE 
    #   behavior = Adoption of Tetracycline 
    #   name = Medical Innovation 
    #   net.obs.period: (not shown)
    #   total edges= 294 
    #     missing edges= 0 
    #     non-missing edges= 294 
    # 
    #  Vertex attribute names: 
    #     active ado adopt attend belief catbak city club coll commun ctl date detail detail2 dichot drug expect free friends here home house info journ journ2 length meet most net1_1 net1_2 net1_3 net2_1 net2_2 net2_3 net3_1 net3_2 net3_3 nojourn nonpoor office origid paadico perc position presc proage proage2 proximty recall recon reltend science social sourinfo special study tend thresh toa vertex.names young 
    # 
    #  Edge attribute names: 
    #     active

    First two examples it creates a list of objects, the later actually creates a single object

    networkDynamic_to_diffnet(dn_ndy, toavar = "toa")
    # Dynamic network of class -diffnet-
    #  Name               : Medical Innovation
    #  Behavior           : Adoption of Tetracycline
    #  # of nodes         : 125 (1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, ...)
    #  # of time periods  : 18 (1 - 18)
    #  Type               : directed
    #  Num of behaviors   : 1
    #  Final prevalence   : 1.00
    #  Static attributes  : -
    #  Dynamic attributes : ado, adopt, attend, belief, catbak, city, club, co... (59)