Using the SequenceOptimizer ============================== GOOSE's ``SequenceOptimizer`` is a flexible tool for designing protein sequences that match user-defined target values. It uses stochastic optimization with adaptive scaling to explore sequence space and minimize the difference between calculated and target property values. You can simultaneously optimize toward arbitrary numbers of properties with individual weights, tolerances, and constraint types. **IMPORTANT NOTE PLEASE READ**: GOOSE is an IDR design tool. **HOWEVER**, when using SequenceOptimizer, you can design anything you want. Thus, sequences are not guaranteed to be predicted to be disordered unless you specify the ``FractionDisorder`` property. Key Features of the New SequenceOptimizer ----------------------------------------- The ``SequenceOptimizer`` has been completely rewritten to provide: * **Adaptive Property Scaling**: Automatically adjusts optimization focus based on property convergence patterns and error magnitudes. This makes it easier to optimize toward properties with highly variable scales or difficult optimization landscapes. * **Diverse Initial Sequences**: If you are generating a completely new sequence, you can specify the number of starting sequences to screen before optimization begins. * **Flexible Constraint Types**: Support for exact matching, minimum thresholds, and maximum constraints for each specified property * **Per-Property Tolerances**: Set individual error tolerances for each property, allowing fine-grained control * **Advanced Convergence Detection**: Multiple convergence criteria including error tolerance, trend analysis, and stagnation detection * **Performance Optimization**: Comprehensive caching, bounded cache size control, and batch property evaluation support for faster optimization * **Arbitrary Number of Properties**: Optimize toward multiple instances of the same property. This was not previously supported. * **Easier Property Value Setting**: For many of the properties, you can now set the target value using a sequence of interest rather than a numeric value. * **Match to arbitrary interaction matrices**: You can now optimize sequences to match arbitrary interaction matrices. * **Linear Profiles for Values**: You can now set ``can_be_linear_profile=True`` for some properties and provide a sequence or list of target values. The optimizer will then attempt to match the profile along the values. * **Hard Composition Constraints**: ``aa_fraction_ranges`` can enforce hard per-residue or grouped composition bounds during candidate generation, which is often faster and more reliable than treating composition as another soft optimization objective. * **Population Diversity Controls**: ``elite_pool_size`` and ``parent_selection`` can retain multiple strong parent sequences instead of mutating only the current best sequence. * **Reproducibility Controls**: ``seed`` makes repeated runs deterministic for the optimizer's own random choices, and ``max_cache_size`` bounds cache growth. Critical Differences between SequenceOptimizer and Create Functionality ----------------------------------------------------------------------- The ``SequenceOptimizer`` represents a fundamentally different approach to sequence generation compared to the ``create`` module: * **Flexibility vs. Speed**: ``SequenceOptimizer`` prioritizes extreme flexibility and handles complex multi-property optimization scenarios that would be difficult ``create``. However, for simple, well-defined property targets, ``create`` functions are typically faster. * **Approximate vs. Exact Solutions**: ``SequenceOptimizer`` returns the best possible sequence within the optimization constraints and may not achieve exact target values. In contrast, ``create`` functions either generate sequences that exactly meet specifications or fail completely. * **Extensibility**: Adding new properties to ``SequenceOptimizer`` requires only implementing a simple property class. Adding new functionality to ``create`` requires significant backend overhead. * **Multi-Property Optimization**: ``SequenceOptimizer`` excels at balancing multiple competing properties simultaneously, while ``create`` functions typically handle individual properties or simple property combinations. .. contents:: Table of Contents :local: :depth: 2 Quick Start Example ------------------- Design a sequence of length 50 with a target hydrophobicity: .. code-block:: python import goose from sparrow import Protein # Initialize optimizer with basic parameters optimizer = goose.SequenceOptimizer( target_length=50, max_iterations=1000, verbose=True ) # Add hydrophobicity property with a tolerance optimizer.add_property( goose.Hydrophobicity, target_value=0.5, weight=1.0, tolerance=0.05 # Allow 5% deviation ) # Run optimization optimized_sequence = optimizer.run() # Analyze results final_protein = Protein(optimized_sequence) print(f"Optimized Sequence: {optimized_sequence}") print(f"Final Hydrophobicity: {final_protein.hydrophobicity:.3f}") print(f"Target Hydrophobicity: 0.5 ± 0.05") **Explanation:** - ``SequenceOptimizer(target_length=50, max_iterations=1000, verbose=True)``: Creates optimizer with sequence length, iteration limit, and progress reporting. - ``add_property(..., tolerance=0.05)``: Adds hydrophobicity optimization with 5% error tolerance. - ``run()``: Executes optimization with adaptive scaling and convergence detection. **Advanced Quick Start with Multiple Properties:** .. code-block:: python import goose optimizer = goose.SequenceOptimizer(target_length=100, verbose=True) # Exact hydrophobicity target optimizer.add_property( goose.Hydrophobicity, target_value=2.4, weight=1.0, ) # Minimum disorder requirement optimizer.add_property( goose.FractionDisorder, target_value=0.8, weight=2.0, # Higher weight = more important constraint_type='minimum', disorder_cutoff=0.5 ) # Maximum FCR constraint optimizer.add_property( goose.FCR, target_value=0.3, weight=1.5, constraint_type='maximum' ) optimized_sequence = optimizer.run() Property Classes Overview ------------------------- All property classes support three constraint types and individual tolerances: * **exact**: Minimize absolute difference from target (default) * **minimum**: Penalize only when below target value * **maximum**: Penalize only when above target value .. note:: ``AminoAcidFractions`` remains a soft optimization objective: the optimizer tries to improve it, but intermediate candidates may violate the target fractions. If you need hard composition bounds throughout optimization, use ``aa_fraction_ranges`` on ``SequenceOptimizer`` instead. To specify constraint type, use the ``constraint_type`` argument when adding a property: .. code-block:: python # Exact target (default) optimizer.add_property(goose.Hydrophobicity, target_value=0.5, constraint_type='exact') # Minimum requirement optimizer.add_property(goose.FractionDisorder, target_value=0.8, constraint_type='minimum') # Maximum constraint optimizer.add_property(goose.FCR, target_value=0.3, constraint_type='maximum') **Basic Properties** +-------------------------------+-----------------------------------------------+------------------------------------------------+ | Property Class | Description | Key Arguments | +===============================+===============================================+================================================+ | Hydrophobicity | Average hydrophobicity (0-9.0 scale) | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | FCR | Fraction of Charged Residues (0-1) | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | NCPR | Net Charge Per Residue (-1 to 1) | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | Kappa | Charge patterning parameter (0-1) | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | SCD | Sequence Charge Decoration | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | SHD | Sequence Hydropathy Decoration | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | Complexity | Wootton-Federhen complexity | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | ComputeIWD | Inverse Weighted Distance | residues (tuple), target_value, weight, | | | | constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | AminoAcidFractions | Target amino acid composition | target_fractions (dict), weight, | | | | constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | MatchingResidues | Number of matching residues to target | target_sequence, target_value, weight, | | | | constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ **Ensemble Properties** +-------------------------------+-----------------------------------------------+------------------------------------------------+ | Property Class | Description | Key Arguments | +===============================+===============================================+================================================+ | RadiusOfGyration | Predicted radius of gyration (A) | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | EndToEndDistance | Predicted end-to-end distance (A) | target_value, weight, constraint_type | +-------------------------------+-----------------------------------------------+------------------------------------------------+ **Disorder** +-------------------------------+-----------------------------------------------+------------------------------------------------+ | Property Class | Description | Key Arguments | +===============================+===============================================+================================================+ | FractionDisorder | Fraction of disordered residues (0-1) | target_value, weight, constraint_type, | | | | disorder_cutoff | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | MatchSequenceDisorder | Match disorder profile of target sequence | target_sequence, weight, constraint_type, | | | | exact_match, target_value | +-------------------------------+-----------------------------------------------+------------------------------------------------+ **Interaction Properties (Epsilon-based)** +------------------------+----------------------------------------+-------------------------------------------+ | Property Class | Description | Key Arguments | +========================+========================================+===========================================+ || MeanSelfEpsilon || Self-interaction potential || target_value, weight, | || || || preloaded_model, constraint_type, model | +------------------------+----------------------------------------+-------------------------------------------+ || MeanEpsilonWithTarget || Mean interaction with target sequence || target_value, target_sequence, weight, | || || || constraint_type, model, preloaded_model | +------------------------+----------------------------------------+-------------------------------------------+ || ChemicalFingerprint || Match chemical fingerprint to target || target_sequence, target_value, weight, | || || || constraint_type, model, preloaded_model, | || || || window_size | +------------------------+----------------------------------------+-------------------------------------------+ **Matrix-based Interaction Properties** +-------------------------------+-----------------------------------------------+------------------------------------------------+ | Property Class | Description | Key Arguments | +===============================+===============================================+================================================+ | MatchSelfIntermap | Match self-interaction matrix | sequence, weight, constraint_type, model, | | | | preloaded_model, inverse, window_size, | | | | allow_matrix_resizing | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | MatchIntermap | Match interaction matrix with target | sequence, target_sequence, weight, | | | | constraint_type, model, preloaded_model, | | | | window_size, allow_matrix_resizing | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | ModifyAttractiveValues | Modify attractive interactions | sequence, target_sequence, multiplier, | | | | weight, constraint_type, model, | | | | preloaded_model, window_size | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | ModifyRepulsiveValues | Modify repulsive interactions | interacting_sequence, | | | | target_interacting_sequence, multiplier, | | | | weight, constraint_type, model, | | | | preloaded_model, window_size | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | ModifyMatrixValues | Modify both attractive and repulsive | interacting_sequence, | | | | target_interacting_sequence, | | | | repulsive_multiplier, attractive_multiplier, | | | | weight, constraint_type, model, | | | | preloaded_model, window_size | +-------------------------------+-----------------------------------------------+------------------------------------------------+ **Folded Domain Surface Properties** +-------------------------------+-----------------------------------------------+------------------------------------------------+ | Property Class | Description | Key Arguments | +===============================+===============================================+================================================+ | FDMeanSurfaceEpsilon | Mean surface epsilon for folded domains | target_value, weight, constraint_type, model, | | | | preloaded_model, path_to_pdb, probe_radius, | | | | surface_thresh, sasa_mode, fd_start, fd_end, | | | | preloaded_fd | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | FDSurfaceEpsilon | Surface epsilon interactions | repulsive_target, attractive_target, weight, | | | | constraint_type, model, preloaded_model, | | | | path_to_pdb, probe_radius, surface_thresh, | | | | sasa_mode, fd_start, fd_end, preloaded_fd | +-------------------------------+-----------------------------------------------+------------------------------------------------+ | FDSurfacePatchInteractions | Surface patch interaction analysis | target_value, weight, constraint_type, model, | | | | preloaded_model, path_to_pdb, probe_radius, | | | | surface_thresh, sasa_mode, fd_start, fd_end, | | | | preloaded_fd, patch_residues | +-------------------------------+-----------------------------------------------+------------------------------------------------+ Optimizer Initialization and Basic Parameters ------------------------------------------------- The ``SequenceOptimizer`` provides extensive control over the optimization process through initialization parameters. You can see additional parameters to change in the Advanced Optimizer Configuration section below. **Basic Parameters:** .. code-block:: python optimizer = goose.SequenceOptimizer( target_length=100, # Required: target sequence length max_iterations=1000, # Maximum optimization iterations verbose=True # Enable progress reporting ) **Mutation and Diversity Parameters:** .. code-block:: python optimizer = goose.SequenceOptimizer( target_length=100, # Candidate generation num_candidates=5, # Candidate sequences per iteration num_starting_candidates=100, # Number of initial sequences to screen min_mutations=1, # Minimum mutations per candidate max_mutations=15, # Maximum mutations per candidate mutation_ratio=10, # Length divisor for mutation calculation elite_pool_size=3, # Keep top-K parent sequences parent_selection='weighted', # weighted, uniform, or best # Shuffling for diversity enable_shuffling=True, # Enable sequence shuffling shuffle_frequency=50, # Shuffle every N iterations global_shuffle_probability=0.4, # Probability of global vs local shuffle shuffle_window_size=15 # Window size for local shuffling ) **Setting Initial Sequences:** .. code-block:: python # Start from a specific sequence initial_seq = "MGSWAEFKQRLAAIKTRLQALGSQAGKKDAE" * 3 # length = 96 optimizer = goose.SequenceOptimizer(target_length=len(initial_seq), verbose=True) optimizer.set_initial_sequence(initial_seq) # The optimizer will automatically calculate normalization factors # based on the initial sequence for adaptive scaling If ``aa_fraction_ranges`` is set, any sequence passed to ``set_initial_sequence()`` must already satisfy those hard composition bounds. Hard Composition Constraints During Optimization ------------------------------------------------ If you want composition to act as a hard constraint on every generated candidate, use ``aa_fraction_ranges`` on the optimizer itself rather than the ``AminoAcidFractions`` property. .. code-block:: python optimizer = goose.SequenceOptimizer( target_length=100, aa_fraction_ranges={ 'A': (0.05, 0.15), # Alanine fraction between 5% and 15% ('W', 'F', 'Y'): (0.05, 0.15), # Aromatic fraction between 5% and 15% 'DE': (0.10, 0.30), # D + E acidic fraction between 10% and 30% }, verbose=True, ) optimizer.add_property(goose.FCR, target_value=0.2, tolerance=0.01) optimized_sequence = optimizer.run() Supported key formats for ``aa_fraction_ranges`` are: - single-letter strings, for per-residue constraints: ``'A': (0.05, 0.15)`` - multi-letter strings, for grouped constraints: ``'WFY': (0.05, 0.15)`` - tuples, lists, sets, or frozensets of residue letters: ``('W', 'F', 'Y'): (0.05, 0.15)`` Behavior notes: - These are hard constraints on generated initial candidates, mutation proposals, and emergency-diversity seeds. - Shuffling is always safe because it preserves composition. - Per-residue entries constrain individual amino acids; grouped entries constrain the sum of those amino acids. - Constraints are checked as integer counts derived from ``target_length``. Multiple Properties, Weights, and Tolerances -------------------------------------------- The optimizer excels at balancing multiple competing properties simultaneously. Each property can have individual weights, tolerances, and constraint types: .. code-block:: python import goose from sparrow import Protein # Create optimizer with advanced parameters optimizer = goose.SequenceOptimizer( target_length=100, max_iterations=2000, verbose=True ) # Critical property - must be close to target optimizer.add_property( goose.FractionDisorder, target_value=0.85, weight=3.0, # High importance tolerance=0.02, # Very strict tolerance (2%) constraint_type='minimum' # Must be at least 85% disordered ) # Important but flexible property optimizer.add_property( goose.FCR, target_value=0.4, weight=2.0, # Medium-high importance tolerance=0.05, # 5% tolerance ) # Secondary property - more flexible optimizer.add_property( goose.NCPR, target_value=-0.1, weight=1.0, # Lower importance tolerance=0.1 # 10% tolerance - quite flexible ) # Compositional constraint optimizer.add_property( goose.AminoAcidFractions, target_fractions={'G': 0.15, 'P': 0.10, 'S': 0.12}, weight=1.5, tolerance=0.03 # 3% tolerance on each amino acid ) # Run optimization optimized_sequence = optimizer.run() # Analyze results final_protein = Protein(optimized_sequence) print(f"Optimized Sequence: {optimized_sequence}") print(f"Final FCR: {final_protein.FCR:.3f} (target: 0.4 ± 0.05)") print(f"Final NCPR: {final_protein.NCPR:.3f} (target: -0.1 ± 0.1)") fracs=final_protein.amino_acid_fractions print(f"Final fractions: G = {fracs['G']:.3f}, P = {fracs['P']:.3f}, S = {fracs['S']:.3f},") Custom Properties ----------------- Creating custom properties is straightforward by subclassing ``CustomProperty``. The new system supports all constraint types and tolerances automatically: .. code-block:: python import goose from goose.backend.optimizer_properties import CustomProperty, ConstraintType import sparrow class AlanineCount(CustomProperty): """Count the number of alanine residues in the sequence.""" def __init__(self, target_value: float, weight: float = 1.0, constraint_type: ConstraintType = ConstraintType.EXACT): super().__init__( target_value=target_value, weight=weight, constraint_type=constraint_type, ) def calculate_raw_value(self, protein: 'sparrow.Protein') -> float: """Calculate the raw property value (before constraint application).""" return float(protein.sequence.count('A')) class MotifCount(CustomProperty): """Count occurrences of a specific motif in the sequence.""" def __init__(self, motif: str, target_value: float, weight: float = 1.0, constraint_type: ConstraintType = ConstraintType.EXACT): super().__init__( target_value=target_value, weight=weight, constraint_type=constraint_type, ) self.motif = motif def get_init_args(self) -> dict: """Override to include motif parameter for serialization.""" return { "motif": self.motif, "target_value": self.target_value, "weight": self.weight, "constraint_type": self.constraint_type.value } def calculate_raw_value(self, protein: 'sparrow.Protein') -> float: sequence = protein.sequence count = 0 start = 0 while True: pos = sequence.find(self.motif, start) if pos == -1: break count += 1 start = pos + 1 return float(count) **Using Custom Properties:** .. code-block:: python # Create optimizer optimizer = goose.SequenceOptimizer(target_length=100, verbose=True) # Add custom properties with different constraint types optimizer.add_property( AlanineCount, target_value=12.0, weight=1.0, constraint_type='exact', tolerance=1.0 # Allow ±1 alanine ) optimizer.add_property( MotifCount, motif="GPG", target_value=3.0, # Want exactly 3 GPG motifs weight=2.0, constraint_type='exact', tolerance=0.0 # Must be exact ) # Standard properties optimizer.add_property( goose.FractionDisorder, target_value=0.8, weight=3.0, constraint_type='minimum', ) # Run optimization optimized_sequence = optimizer.run() # Analyze results final_protein = sparrow.Protein(optimized_sequence) print(f"Optimized Sequence: {optimized_sequence}") print(f"Alanine count: {optimized_sequence.count('A')}") print(f"GPG motifs: {optimized_sequence.count('GPG')}") **Implementing Batch Calculation for Performance (Optional):** For properties that benefit from batch processing (e.g., using external APIs or vectorized operations), you can enable batch calculation by setting the ``calculate_in_batch`` class attribute and implementing ``calculate_raw_value_batch()``: .. code-block:: python import numpy as np from goose.backend.optimizer_properties import CustomProperty import sparrow class VectorizedHydrophobicity(CustomProperty): """Example property with batch calculation support.""" calculate_in_batch = True # Enable batch processing def __init__(self, target_value: float, weight: float = 1.0): super().__init__(target_value=target_value, weight=weight) def calculate_raw_value(self, protein: 'sparrow.Protein') -> float: """Single sequence calculation (fallback).""" return protein.hydrophobicity def calculate_raw_value_batch(self, proteins: list) -> list: """ Batch calculation for multiple proteins (more efficient). Parameters ---------- proteins : list of sparrow.Protein List of protein instances to calculate Returns ------- list of float List of calculated property values """ # Example: Use vectorized operations for efficiency. This is not actually faster return [p.hydrophobicity for p in proteins] .. note:: **When to Use Batch Calculation:** - When calling external APIs that support batch processing (e.g., metapredict or other predictors that support batches) - When using vectorized NumPy operations across multiple sequences - When property calculation has expensive setup costs that can be amortized **Performance Impact:** - ``FractionDisorder`` uses batch calculation for ~2-5× speedup with metapredict - Not all properties benefit from batch calculation - Single-sequence calculation is used as fallback when batch is unavailable .. note:: **Best Practices for Custom Properties** - Always implement ``calculate_raw_value()`` instead of ``calculate()`` - Pass base-class arguments to ``super().__init__()`` by keyword for clarity and forward compatibility - Use ``get_init_args()`` if your property has additional parameters - The base class automatically handles constraint types and tolerances - Optionally implement batch calculation for performance with ``calculate_in_batch = True`` - Batch calculation is automatically used when available if ``calculate_in_batch`` is True; fallback is single-sequence mode Advanced Optimizer Configuration --------------------------------- Below are additional parameters to customize the optimization process. You can set these during initialization or modify them later using dedicated methods. The default parameter values are chosen to provide robust performance across a wide range of scenarios. However, you can adjust them to better suit your specific optimization needs. **Convergence and Tolerance Controls:** .. code-block:: python optimizer = goose.SequenceOptimizer( target_length=100, # Error tolerance stopping error_tolerance=1e-6, # Stop when total error below this value enable_error_tolerance=True, # Enable error tolerance early stopping # Convergence detection convergence_tolerance=1e-4, # Convergence criterion for early stopping convergence_window=20, # Number of recent iterations to check enable_early_convergence=False, # Enable early stopping on convergence convergence_patience=20, # Wait iterations after convergence # Stagnation detection stagnation_threshold=25, # Iterations before considering stagnant stagnation_improvement_threshold=0.005 # Minimum improvement to avoid stagnation ) **Adaptive Scaling Parameters:** .. code-block:: python optimizer = goose.SequenceOptimizer( target_length=100, # Adaptive scaling control enable_adaptive_scaling=True, # Enable adaptive property scaling max_distance_factor=3.0, # Maximum scaling based on distance distance_offset=0.2, # Offset for distance calculation boost_factor=2.0, # Factor to boost underperforming properties scale_momentum=0.5, # Momentum for scale smoothing (0-1) scale_learning_rate=0.5, # Learning rate for scale updates (0-1) min_scale=0.1, # Minimum allowed property scale max_scale=8.0, # Maximum allowed property scale # Thresholds for adaptive behavior low_contribution_threshold=0.15, # Threshold for low-contributing properties high_error_threshold=0.05, # Threshold for high-error properties stagnation_multiplier=1.0 # Multiplier for stagnation response ) **Stagnation Recovery and Multi-Objective Controls:** .. code-block:: python optimizer = goose.SequenceOptimizer( target_length=100, enforce_raw_monotonicity=False, # Allow weighted trade-offs by default raw_monotonicity_slack=0.05, # If monotonicity is enabled, allow 5% slack scale_freeze_window=25, # Freeze adaptive scaling after recovery max_norm_boost=100.0, # Cap on recovery-only normalization boosts norm_boost_decay=0.95, # Decay recovery boosts after progress resumes recompute_norm_on_emergency=True, improvement_trend_threshold=-0.001, stagnation_boost_factor=2.0, ) **Population, Reproducibility, and Cache Parameters:** .. code-block:: python optimizer = goose.SequenceOptimizer( target_length=100, # History tracking improvement_history_size=20, # Recent improvements per property error_history_size=50, # Recent error values to store # Trend analysis min_trend_samples=5, # Minimum samples for trend calculation # Population diversity and reproducibility elite_pool_size=3, # Keep multiple strong parent sequences parent_selection='weighted', # weighted, uniform, or best seed=1, # Reproducible optimizer random state max_cache_size=1000, # Bound evaluation-cache growth # Progress reporting update_interval=10, # Update progress every N iterations debugging=False ) **Dynamic Configuration Methods:** You can modify convergence and error tolerance settings after initialization: .. code-block:: python # Configure convergence detection optimizer.configure_convergence( tolerance=1e-5, # New convergence tolerance window=30, # New convergence window enable_early_stopping=True, # Enable early stopping patience=15 # New patience value ) # Configure error tolerance optimizer.configure_error_tolerance( tolerance=1e-7, # New error tolerance enable=True # Enable/disable error tolerance stopping ) # Get convergence information convergence_info = optimizer.get_convergence_info() print(f"Convergence status: {convergence_info}") # Inspect cache behavior cache_stats = optimizer.get_cache_statistics() print(f"Cache hit rate: {cache_stats['hit_rate']:.1%}") Troubleshooting and Optimization Tips ------------------------------------- **Optimization Not Converging** *Symptoms*: Error plateaus at high values, properties far from targets *Solutions*: - **Increase iterations**: ``max_iterations=5000`` or higher for complex problems - **Enable adaptive scaling**: ``enable_adaptive_scaling=True`` (default) - **Increase diversity**: ``shuffle_frequency=25``, ``num_candidates=10`` - **Use hard composition bounds when appropriate**: ``aa_fraction_ranges`` removes composition-violating candidates before they are evaluated - **Check target compatibility**: Ensure properties don't fundamentally conflict - **Use tolerances**: Set reasonable ``tolerance`` values for each property - **Verify constraint types**: Make sure you're using appropriate constraints **Slow Optimization Performance** *Symptoms*: Optimization takes too long, high memory usage *Solutions*: - **Reduce candidates**: ``num_candidates=3`` for faster iterations (default is 5) - **Disable expensive features**: ``enable_adaptive_scaling=False``, ``enable_shuffling=False`` - **Use stricter early stopping**: ``error_tolerance=1e-4``, ``enable_early_convergence=True`` - **Optimize caching**: Check cache hit rate with ``get_cache_statistics()`` - **Bound cache growth**: Lower ``max_cache_size`` when exploring many large candidate sets - **Pre-load models**: Use ``preloaded_model`` for epsilon properties **Property Conflicts and Balancing** *Symptoms*: Some properties optimize while others get worse *Solutions*: - **Adjust weights**: Higher weight = higher priority - **Use appropriate constraint types**: MINIMUM/MAXIMUM instead of EXACT when possible - **Set generous tolerances**: Allow some flexibility in less critical properties - **Use hard composition bounds for composition requirements**: Prefer ``aa_fraction_ranges`` over ``AminoAcidFractions`` when composition must stay inside a fixed range throughout optimization - **Check physical compatibility**: Some combinations may be impossible - **Monitor individual properties**: Enable ``verbose=True`` to track individual progress .. code-block:: python # Balanced multi-property optimization optimizer.add_property(goose.FractionDisorder, target_value=0.8, weight=3.0, constraint_type='minimum', tolerance=0.05) optimizer.add_property(goose.FCR, target_value=0.3, weight=1.0, constraint_type='exact', tolerance=0.1) optimizer.add_property(goose.Hydrophobicity, target_value=0.4, weight=0.5, constraint_type='exact', tolerance=0.2) **Memory Issues with Large Sequences** *Symptoms*: Out of memory errors, excessive RAM usage *Solutions*: - **Reduce history sizes**: ``improvement_history_size=5``, ``error_history_size=10`` - **Clear cache periodically**: Call ``optimizer._clear_evaluation_cache()`` if needed - **Lower ``max_cache_size``**: Useful when exploring many unique large sequences - **Use fewer candidates**: ``num_candidates=3`` for large sequences .. code-block:: python # Memory-efficient settings for large sequences optimizer = goose.SequenceOptimizer( target_length=1000, improvement_history_size=5, error_history_size=10, num_candidates=3, debugging=False ) **Stagnation Issues** *Symptoms*: Error doesn't improve for many iterations *Solutions*: - **Enable shuffling**: ``enable_shuffling=True`` with frequent shuffling - **Adjust stagnation detection**: Lower ``stagnation_threshold=15`` - **Increase mutation diversity**: Higher ``max_mutations=20`` - **Check for impossible targets**: Some property combinations may be unachievable Examples and Demo Notebooks ---------------------------- GOOSE includes comprehensive demo notebooks showcasing advanced ``SequenceOptimizer`` usage in the /demos directory. These include: - Basic optimization: see sequence_optimization.ipynb for basic usage. - Custom properties: see custom_optimizer_peroperties.ipynb for creating and implementing custom user-defined properties - Design by interaction: see generate_sequences_by_interaction.ipynb for designing sequences to interact with a target sequence using epsilon-based properties. - Design by linear profiles: see linear_profiles.ipynb for designing sequences to match linear profiles of properties like NCPR. - Design by interaction matrices: see epsilon_matrix_variants.ipynb for designing sequences to match or modify interaction matrices. **Demo Location:** Check the ``demos`` directory for Jupyter notebooks with detailed examples and explanations. API Reference ------------- **Core Classes:** - ``goose.SequenceOptimizer``: Main optimization engine - ``goose.backend.optimizer_properties.ProteinProperty``: Base class for properties - ``goose.backend.optimizer_properties.CustomProperty``: Base class for custom properties users can define **Key Methods:** - ``SequenceOptimizer.add_property()``: Add properties to optimize - ``SequenceOptimizer.set_initial_sequence()``: Set starting sequence - ``SequenceOptimizer.run()``: Execute optimization See Also -------- For complete API documentation, see ``goose/optimize.py`` and ``goose/backend/optimizer_properties.py``. For implementation examples and advanced usage patterns, explore the demo notebooks in ``demos/``.