Examples
========

Complete Workflow Example
-------------------------

Here's a complete example showing how to analyze RDS data from start to finish::

    import pandas as pd
    from RDSTools import (
        load_toy_data, RDSdata, RDSmean, RDStable, RDSlm,
        RDSnetgraph, RDSmap, get_available_seeds, print_map_info
    )

    # 1. Load and examine your data
    # Option A: Use the included example dataset
    toy_data = load_toy_data()

    # Option B: Load your own data
    #
    # data = pd.read_csv("rds_survey.csv")
    print(data.columns)
    print(f"Total participants: {len(data)}")

    # 2. Process the RDS structure
    rds_data = RDSdata(
        data=data,
        unique_id="ID",
        redeemed_coupon="RecruitCoupon",
        issued_coupons=["Coupon_1", "Coupon_2", "Coupon_3"],
        degree="NetworkSize",
        zero_degree="median",
        NA_degree="hotdeck"
    )

    # Check the processed data
    print(f"Seeds: {rds_data['SEED'].sum()}")
    print(f"Max wave: {rds_data['WAVE'].max()}")

    # 3. Calculate means with parallel processing
    mean_age = RDSmean(
        x='Age',
        data=rds_data,
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=1000,
        n_cores=4
    )
    print(mean_age)

    # 4. Generate frequency tables
    sex_table = RDStable(
        x='Sex',
        data=rds_data,
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=1000
    )
    print(sex_table)

    # Two-way table
    cross_table = RDStable(
        x='Sex',
        y='Race',
        data=rds_data,
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=1000,
        margins=1  # row proportions
    )
    print(cross_table)

    # 5. Fit regression models
    income_model = RDSlm(
        data=rds_data,
        formula='Income ~ Age + C(Sex) + C(Race)',
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=2000,
        n_cores=6
    )
    print(income_model)

Descriptive Statistics Examples
-------------------------------

Unweighted mean with naive variance::

    result = RDSmean(
        x='Age',
        data=rds_data,
        var_est=None  # naive method
    )

Weighted mean with bootstrap variance::

    result = RDSmean(
        x='Age',
        data=rds_data,
        weight='WEIGHT',
        var_est='chain1',
        resample_n=1000
    )

Return bootstrap means for custom analysis::

    result, bootstrap_means, node_counts = RDSmean(
        x='Age',
        data=rds_data,
        var_est='tree_uni1',
        resample_n=1000,
        return_bootstrap_means=True,
        return_node_counts=True
    )

    # Analyze bootstrap distribution
    import numpy as np
    print(f"Bootstrap mean: {np.mean(bootstrap_means)}")
    print(f"Bootstrap SE: {np.std(bootstrap_means)}")

Table Examples
--------------

One-way table with different margin options::

    # Simple one-way table
    table = RDStable(
        x='Sex',
        data=rds_data
    )

    # Weighted one-way table with bootstrap
    table = RDStable(
        x='Race',
        data=rds_data,
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=500
    )

Two-way tables with different proportions::

    # Cell proportions (default)
    table_cell = RDStable(
        x='Sex',
        y='Race',
        data=rds_data,
        margins=3
    )

    # Row proportions
    table_row = RDStable(
        x='Sex',
        y='Race',
        data=rds_data,
        margins=1
    )

    # Column proportions
    table_col = RDStable(
        x='Sex',
        y='Race',
        data=rds_data,
        margins=2
    )

Regression Examples
-------------------

Simple linear regression::

    model = RDSlm(
        data=rds_data,
        formula='Income ~ Age'
    )

Multiple linear regression with categorical predictors::

    model = RDSlm(
        data=rds_data,
        formula='Income ~ Age + C(Sex) + C(Education) + C(Race)',
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=1000,
        n_cores=4
    )

Logistic regression::

    # Binary outcome (0/1)
    model = RDSlm(
        data=rds_data,
        formula='Employed ~ Age + C(Sex) + C(Education)',
        var_est='chain1',
        resample_n=500
    )

Return bootstrap estimates::

    model, boot_estimates, node_counts = RDSlm(
        data=rds_data,
        formula='Income ~ Age + C(Sex)',
        var_est='tree_uni1',
        resample_n=1000,
        return_bootstrap_estimates=True,
        return_node_counts=True
    )

Network Visualization Examples
------------------------------

Basic network graph with different layouts::

    # Spring layout (default)
    G = RDSnetgraph(
        data=rds_data,
        seed_ids=['1', '2'],
        waves=[0, 1, 2, 3],
        layout='Spring'
    )

    # Tree layout (hierarchical)
    G = RDSnetgraph(
        data=rds_data,
        seed_ids=['1'],
        waves=[0, 1, 2, 3, 4],
        layout='Tree',
        save_path='recruitment_tree.png'
    )

    # Circular layout
    G = RDSnetgraph(
        data=rds_data,
        seed_ids=['1', '2'],
        waves=[0, 1, 2],
        layout='Circular',
        figsize=(12, 12)
    )

Color nodes by demographic variables::

    # Color by Sex
    G = RDSnetgraph(
        data=rds_data,
        seed_ids=['1', '2', '3'],
        waves=[0, 1, 2],
        layout='Kamada-Kawai',
        variable='Sex',
        node_size=50,
        figsize=(16, 14)
    )

    # Color by Race
    G = RDSnetgraph(
        data=rds_data,
        seed_ids=['1'],
        waves=[0, 1, 2, 3],
        layout='Spring',
        variable='Race',
        node_size=40
    )

Use custom colors for categories::

    # First, check what categories exist (they'll be sorted)
    print(sorted(rds_data['Race'].dropna().unique()))
    # Output: ['Asian', 'Black', 'Hispanic', 'White']

    # Provide colors in the same sorted order
    custom_colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#95E1D3']

    G = RDSnetgraph(
        data=rds_data,
        seed_ids=['1', '2'],
        waves=[0, 1, 2],
        variable='Race',
        category_colors=custom_colors,
        title='Recruitment by Race (Custom Colors)'
    )

    # Using named colors instead of hex codes
    G = RDSnetgraph(
        data=rds_data,
        seed_ids=['1', '2', '3'],
        waves=[0, 1, 2, 3],
        variable='Sex',
        category_colors=['purple', 'orange'],  # For 2 categories
        layout='Tree',
        save_path='network_custom.png'
    )


Geographic Mapping Examples
---------------------------

Check available mapping data::

    # Print comprehensive map information
    print_map_info(rds_data, lat='Latitude', long='Longitude')

    # Get available seeds and waves
    seeds = get_available_seeds(rds_data)
    waves = get_available_waves(rds_data)
    print(f"Seeds: {seeds}")
    print(f"Waves: {waves}")

Basic map::

    m = RDSmap(
        data=rds_data,
        seed_ids=['1', '2'],
        waves=[0, 1, 2, 3],
        output_file='my_rds_map.html'
    )

Map with custom coordinates and settings::

    m = RDSmap(
        data=rds_data,
        seed_ids=['1', '2', '3'],
        waves=[0, 1, 2, 3, 4],
        lat='lat',
        long='long',
        output_file='geographic_map.html',
        zoom_start=10,
        open_browser=True
    )

Bootstrap Examples
------------------

Standalone bootstrap resampling::

    from RDSTools import RDSboot

    # Standard bootstrap
    boot_results = RDSboot(
        data=rds_data,
        respondent_id_col='ID',
        seed_id_col='S_ID',
        seed_col='SEED',
        recruiter_id_col='R_ID',
        type='tree_uni1',
        resample_n=1000
    )

    # Check first resample
    sample_1 = boot_results[boot_results['RESAMPLE.N'] == 1]
    merged = pd.merge(sample_1, rds_data,
                     left_on='RESPONDENT_ID', right_on='ID')
    print(f"Bootstrap sample size: {len(merged)}")

Parallel bootstrap for large datasets::

    from RDSTools import RDSBootOptimizedParallel

    boot_results = RDSBootOptimizedParallel(
        data=rds_data,
        respondent_id_col='ID',
        seed_id_col='S_ID',
        seed_col='SEED',
        recruiter_id_col='R_ID',
        type='tree_uni1',
        resample_n=10000,
        n_cores=8
    )

Performance Comparison
----------------------

The parallel bootstrap provides significant speedups:

.. list-table:: Performance Comparison (252 observations)
   :header-rows: 1

   * - Cores
     - Bootstrap Samples
     - Standard Time
     - Parallel Time
     - Speedup
   * - 1
     - 1000
     - 120s
     - 120s
     - 1.0x
   * - 4
     - 1000
     - 120s
     - 18s
     - 6.7x
   * - 8
     - 1000
     - 120s
     - 12s
     - 10.0x

Complete Analysis Pipeline
---------------------------

Here's a complete pipeline from data loading to final results::

    import pandas as pd
    from RDSTools import (
        load_toy_data, RDSdata, RDSmean, RDStable, RDSlm,
        RDSnetgraph, RDSmap, get_available_seeds
    )

    # Load data
    data = pd.read_csv("survey.csv")

    # Process RDS structure
    rds_data = RDSdata(
        data=data,
        unique_id="ID",
        redeemed_coupon="CouponR",
        issued_coupons=["Coupon1", "Coupon2", "Coupon3"],
        degree="Degree"
    )

    # Descriptive statistics
    age_mean = RDSmean(
        x='Age',
        data=rds_data,
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=1000,
        n_cores=4
    )

    # Frequency tables
    sex_table = RDStable(
        x='Sex',
        data=rds_data,
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=1000
    )

    race_sex_table = RDStable(
        x='Sex',
        y='Race',
        data=rds_data,
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=1000,
        margins=1
    )

    # Regression analysis
    model = RDSlm(
        data=rds_data,
        formula='Income ~ Age + C(Sex) + C(Race) + C(Education)',
        weight='WEIGHT',
        var_est='tree_uni1',
        resample_n=2000,
        n_cores=4
    )

    # Visualizations
    seeds = get_available_seeds(rds_data)

    # Network graph
    G = RDSnetgraph(
        data=rds_data,
        seed_ids=seeds[:2],
        waves=[0, 1, 2, 3],
        layout='Spring',
        variable='Sex',
        save_path='network.png'
    )

    # Geographic map
    m = RDSmap(
        data=rds_data,
        seed_ids=seeds[:2],
        waves=[0, 1, 2, 3],
        output_file='map.html',
        open_browser=True
    )

    # Print results
    print("Age Mean:")
    print(age_mean)
    print("\nSex Distribution:")
    print(sex_table)
    print("\nRegression Model:")
    print(model)