Visualization

The RDS Tools package supports visualization of respondents’ networks and the geographic distribution of recruitment waves starting from seeds. Users can generate network plots to examine recruitment chains overall and by some characteristic, as well as geographic maps that display participant locations and the spread of recruitment over time or across regions. These visualizations aid in understanding the structure of chains and the geographic reach of RDS studies.

RDSnetgraph - Recruitment Network Visualization

The RDSnetgraph function creates network visualizations showing recruitment relationships between participants. This function supports multiple layout algorithms and customization options for coloring nodes by demographic variables.

Usage

RDSnetgraph(data, seed_ids, waves, layout='Spring', variable=None, category_colors=None,
            vertex_size=6, vertex_size_seed=10, seed_color='#E41A1C', nonseed_color='#377EB8',
            edge_width=1.0, title=None, figsize=(12, 10), show_plot=True, save_path=None)

Arguments

data

pandas.DataFrame. The output from RDSdata function containing preprocessed RDS data with recruitment relationships.

seed_ids

list of str or list of int. List of seed IDs to include in the network visualization. These should match the IDs in the ‘S_ID’ column of the data.

waves

list of int. List of wave numbers to display in the network. Wave 0 represents seeds. Use list(range(0, n)) to include waves 0 through n-1.

layout

str, optional. Specifies the network layout algorithm to use. Default is ‘Spring’. Available options:

  • Spring: Force-directed layout (default, uses igraph Fruchterman-Reingold algorithm). Good for general network visualization.

  • Tree: Hierarchical tree layout (requires pygraphviz, uses NetworkX). Best for visualizing recruitment chains as a tree structure.

  • Circular: Circular arrangement of nodes. Good for networks with cyclical patterns.

  • Kamada-Kawai: Force-directed layout with uniform edge lengths. Provides more uniform spacing than Spring.

  • Grid: Grid-based layout. Useful for ordered data.

  • Star: Star-shaped layout. Centers one node with others radiating outward.

  • Random: Random positioning of nodes. Useful for comparison or testing.

variable

str, optional. Name of a categorical variable in the data to color nodes by. When specified, nodes will be colored according to their category values. Variables with 10+ categories trigger a warning as they may be hard to interpret. Default is None (uses seed/nonseed coloring).

category_colors

list of str, optional. Custom colors for categories when using the variable parameter. Must provide exactly one color per category in the variable. Colors are assigned to categories in sorted alphabetical/numerical order. Accepts hex codes (e.g., ‘#FF0000’) or named colors (e.g., ‘red’). If not provided, uses the default 20-color palette. Default is None.

vertex_size

int or float, optional. Size of non-seed vertices (nodes) in the network graph. Default is 6.

vertex_size_seed

int or float, optional. Size of seed vertices (nodes) in the network graph. Seeds are typically displayed larger to distinguish them. Default is 10.

seed_color

str, optional. Color for seed nodes when not using the variable parameter for grouping. Accepts hex codes or named colors. Default is ‘#E41A1C’ (red).

nonseed_color

str, optional. Color for non-seed nodes when not using the variable parameter for grouping. Accepts hex codes or named colors. Default is ‘#377EB8’ (blue).

edge_width

float, optional. Width of edges (lines) connecting nodes in the recruitment network. Default is 1.0.

title

str, optional. Title for the network graph. If not provided, a default title is automatically generated showing the seeds and waves included.

figsize

tuple of (int, int), optional. Figure size in inches as (width, height). Default is (12, 10).

show_plot

bool, optional. If True, displays the plot in the current environment. If False, the plot is not shown but can still be saved using save_path. Default is True.

save_path

str, optional. File path to save the network graph image. Supports common image formats (.png, .pdf, .svg, .jpg). If None, the graph is not saved to file. Default is None.

Returns

Graph object

Returns either an igraph.Graph object (for Spring, Circular, Kamada-Kawai, Grid, Star, Random layouts) or a networkx.DiGraph object (for Tree layout). The returned graph object contains all nodes, edges, and attributes and can be further manipulated or analyzed.

Examples

from RDSTools import RDSnetgraph

# Basic network graph with Spring layout
G = RDSnetgraph(
    data=rds_data,
    seed_ids=['1', '2'],
    waves=[0, 1, 2, 3],
    layout='Spring'
)

# Tree layout showing hierarchical recruitment structure
G = RDSnetgraph(
    data=rds_data,
    seed_ids=['1', '2'],
    waves=[0, 1, 2],
    layout='Tree'
)

# Color nodes by demographic variable
G = RDSnetgraph(
    data=rds_data,
    seed_ids=['1', '2'],
    waves=[0, 1, 2],
    layout='Spring',
    variable='Sex',
    title='Recruitment by Sex',
    vertex_size_seed=10,
    vertex_size=6,
    figsize=(14, 12)
)

# Use custom colors for categories
# Colors must match the number of categories in sorted alphabetical/numerical order
custom_colors = ['#FF6B6B', '#4ECDC4', '#45B7D1']  # For 3 categories

G = RDSnetgraph(
    data=rds_data,
    seed_ids=['1', '2'],
    waves=[0, 1, 2],
    variable='Race',  # Assuming Race has 3 categories
    category_colors=custom_colors,
    title='Recruitment by Race (Custom Colors)'
)

# Customize colors when not grouping by variable
G = RDSnetgraph(
    data=rds_data,
    seed_ids=['1', '2'],
    waves=[0, 1, 2],
    seed_color='purple',
    nonseed_color='orange',
    edge_width=2.0
)

# Save network graph without displaying
G = RDSnetgraph(
    data=rds_data,
    seed_ids=['1'],
    waves=[0, 1, 2, 3, 4],
    layout='Tree',
    save_path='recruitment_tree.png',
    show_plot=False
)

Color Customization for Network Graphs

When using the variable parameter to color nodes by categories:

Default Color Palette:
  • A 20-color palette is automatically applied

  • Colors 1-8: Strong, distinct colors (Set1 palette)

  • Colors 9-16: Muted, professional colors (Dark2 palette)

  • Colors 17-20: Soft, light colors (Pastel1 palette)

  • For 20+ categories, colors recycle (with a warning)

Custom Colors:
  • Use category_colors parameter with a list of color codes

  • Must provide exactly one color per category

  • Colors apply in sorted alphabetical/numerical order of categories

  • Accepts hex codes (e.g., ‘#FF0000’) or named colors (e.g., ‘red’)

Important Notes:
  • Variables with 10+ categories trigger a warning (may be hard to interpret)

  • Variables with 20+ categories recycle colors and strongly suggest custom colors

  • Categories are always sorted before color assignment for consistency

  • To check category order: sorted(data[variable].dropna().unique())

RDSmap - Geographic Distribution Mapping

When longitude and latitude are available, users can plot distribution of recruitment overall or for each wave. The RDSmap function allows explicit control of the number of waves and seeds in the plot. If waves are not specified, all available waves are automatically included.

Usage

RDSmap(data, lat, long, seed_ids, waves=None, seed_color="red", seed_radius=7,
       recruit_color="blue", recruit_radius=7, line_color="black", line_weight=2,
       line_dashArray=None, output_file='participant_map.html', zoom_start=5,
       open_browser=False)

Arguments

data

pandas.DataFrame. The output from RDSdata function containing preprocessed RDS data with latitude and longitude coordinates per respondent.

lat

str. Column name for latitude coordinates in the data. Values should be numeric, in the range [-90, 90].

long

str. Column name for longitude coordinates in the data. Values should be numeric, in the range [-180, 180].

seed_ids

list of str or list of int. List of seed IDs to display on the map. Use get_available_seeds() to see available seeds in the dataset.

waves

list of int, optional. List of wave numbers to display on the map. If not specified, all available waves are automatically included. Wave 0 represents seeds. Use get_available_waves() to see available waves, or specify explicitly like list(range(0, 4)) or [0, 1, 2, 3] for waves 0-3. Default is None (all waves).

seed_color

str, optional. Color of seed circle markers on the map. Accepts standard CSS color names (e.g., ‘red’, ‘blue’, ‘green’) or hex codes (e.g., ‘#FF0000’). Default is ‘red’.

seed_radius

int, optional. Radius (size) of seed circle markers in pixels. Larger values create bigger circles. Default is 7.

recruit_color

str, optional. Color of recruit circle markers on the map. Accepts standard CSS color names or hex codes. Default is ‘blue’.

recruit_radius

int, optional. Radius (size) of recruit circle markers in pixels. Default is 7.

line_color

str, optional. Color of lines connecting recruiters to recruits, showing recruitment relationships. Accepts standard CSS color names or hex codes. Default is ‘black’.

line_weight

int, optional. Thickness (width) of lines connecting recruiters to recruits in pixels. Default is 2.

line_dashArray

str, optional. Dash pattern for lines connecting recruiters to recruits. Format is a string of comma-separated numbers representing dash and gap lengths (e.g., ‘5,6’ creates dashed lines with 5-pixel dashes and 6-pixel gaps). If None, solid lines are used. Default is None.

output_file

str, optional. Name of the HTML file to save the interactive map. The file is saved in the current working directory. Default is ‘participant_map.html’.

zoom_start

int, optional. Initial zoom level for the map. Lower values show more area (zoomed out), higher values show less area (zoomed in). Typical range is 1-18. Default is 5.

open_browser

bool, optional. If True, automatically opens the generated map in the default web browser after creation. If False, the map is saved but not opened. Default is False.

Returns

folium.Map

A Folium map object containing the interactive visualization. The map shows:

  • Seed locations as circle markers (red by default)

  • Non-seed locations as circle markers (blue by default)

  • Lines connecting recruiters to recruits for consecutive waves

  • Interactive popups with participant details (ID, seed ID)

  • A legend showing seed and recruit marker types

Raises

ValueError

If seed_ids or waves lists are empty, if coordinate columns are not found in the data, or if no valid coordinates are found.

Examples

from RDSTools import RDSmap, get_available_seeds, get_available_waves, print_map_info

# Check available data
print_map_info(rds_data, lat='Latitude', long='Longitude')

# Get available seeds and waves
seeds = get_available_seeds(rds_data)
waves = get_available_waves(rds_data)

print(f"Available seeds: {seeds}")
print(f"Available waves: {waves}")

# Simplest map - uses all available waves automatically
m = RDSmap(
    data=rds_data,
    seed_ids=['1', '2'],
    lat='Latitude',
    long='Longitude',
    output_file='recruitment_map.html'
)

# Create map with specific waves
m = RDSmap(
    data=rds_data,
    seed_ids=['1', '2'],
    waves=[0, 1, 2, 3],
    lat='Latitude',
    long='Longitude',
    output_file='recruitment_map.html'
)

# Custom map with specific zoom and auto-open
m = RDSmap(
    data=rds_data,
    seed_ids=['1', '2', '3'],
    waves=[0, 1, 2, 3, 4],
    lat='Latitude',
    long='Longitude',
    output_file='custom_map.html',
    zoom_start=10,
    open_browser=True
)

# Customize marker colors and sizes
m = RDSmap(
    data=rds_data,
    seed_ids=['1', '2'],
    waves=list(range(0, 4)),
    lat='Latitude',
    long='Longitude',
    seed_color='red',
    seed_radius=7,
    recruit_color='blue',
    recruit_radius=7,
    line_color='black',
    line_weight=2,
    line_dashArray='5,6',  # Dashed lines
    output_file='styled_map.html'
)

# Use helper functions to explore data
available_seeds = get_available_seeds(rds_data)
available_waves = get_available_waves(rds_data)

m = RDSmap(
    data=rds_data,
    seed_ids=available_seeds[:2],    # First 2 seeds
    waves=available_waves[:4],       # First 4 waves
    lat='Latitude',
    long='Longitude'
)

Helper Functions

get_available_seeds(data)

Get list of available seed IDs from RDS data.

Parameters:
  • data (pandas.DataFrame): RDS data processed by RDSdata function

Returns:
  • list of str: Sorted list of unique seed IDs

get_available_waves(data)

Get list of available wave numbers from RDS data.

Parameters:
  • data (pandas.DataFrame): RDS data processed by RDSdata function

Returns:
  • list of int: Sorted list of unique wave numbers

print_map_info(data, lat=’Latitude’, long=’Longitude’)

Print summary information about the RDS data for mapping, including available seeds, waves, and coordinate coverage.

Parameters:
  • data (pandas.DataFrame): RDS data processed by RDSdata function

  • lat (str): Name of latitude column to check

  • long (str): Name of longitude column to check