RDS Tools Documentation
RDS Tools is a Python package for Respondent-Driven Sampling (RDS) analysis and bootstrap resampling with parallel processing capabilities.
Contents
User Guide
- Installation
- Quick Start
- Data Processing
- Estimation
- RDSmean - Descriptive Statistics
- RDStable - Contingency Tables
- RDSlm - Linear and Logistic Regression
- Sampling Variance
- RDSboot - Standard Bootstrap
- RDSBootOptimizedParallel - Parallel Bootstrap
- Working with Results
- Performance Considerations
- Visualization
- RDSnetgraph - Recruitment Network Visualization
- RDSmap - Geographic Distribution Mapping
- Performance Enhancement
- Examples
Installation
cd RDSTools
pip install .
For development:
pip install -e .
Quick Start
from RDSTools import load_toy_data
from RDSTools import RDSdata, RDSmean, RDStable, RDSlm
# Load the included example dataset
toy_data = load_toy_data()
# Process RDS data
# Or load your own data:
# import pandas as pd
# data = pd.read_csv("survey_data.csv")
rds_data = RDSdata(
data=toy_data,
unique_id="ID",
redeemed_coupon="CouponR",
issued_coupons=["Coupon1", "Coupon2", "Coupon3"],
degree="Degree"
)
# Calculate means with bootstrap variance
result = RDSmean(
x='Age',
data=rds_data,
weight='WEIGHT',
var_est='tree_uni1',
resample_n=1000,
n_cores=4 # parallel processing
)
print(result)
# Create frequency tables
table = RDStable(
x='Sex',
data=rds_data,
weight='WEIGHT',
var_est='tree_uni1',
resample_n=1000
)
print(table)
# Fit regression models
model = RDSlm(
data=rds_data,
formula='Income ~ Age + C(Sex) + C(Race)',
weight='WEIGHT',
var_est='tree_uni1',
resample_n=1000,
n_cores=4
)
print(model)
Key Features
- Data Processing
RDSdata()- Process RDS survey data and create network structureAutomatic wave detection and seed identification
Flexible degree imputation methods (mean, median, hotdeck, drop)
- Estimation
RDSmean()- Calculate means with RDS-adjusted standard errorsRDStable()- Generate one-way and two-way frequency tablesRDSlm()- Linear and logistic regression modelsWeighted and unweighted analyses
Naive and bootstrap variance estimation
- Bootstrap Variance
RDSboot()- Six bootstrap resampling methodsRDSBootOptimizedParallel()- Parallel bootstrap for performanceChain, tree unidirectional, and tree bidirectional methods
- Visualization
RDSnetgraph()- Network visualizations with multiple layoutsRDSmap()- Interactive geographic mapsHelper functions:
get_available_seeds(),get_available_waves(),print_map_info()
- Performance
Parallel processing support with
n_coresparameterUp to 10x speedup with 8 cores
Optimized algorithms for large datasets