Creating Sequence Variants

First import create from goose (if you haven’t done so for sequence generation).

from goose import create

Once create has been imported, you can start making sequence variants!

Apart from simply generating sequences, GOOSE can help you make different types of sequence variants. The primary input for sequence variant generation is your sequence of interest, and you specify the type of variant you want to create.

Overview of the variant() function

GOOSE provides a unified interface for generating sequence variants through the variant() function. This function takes your input sequence, a variant type, and any additional parameters needed for that specific variant type.

variant_sequence = create.variant(sequence, variant_type, **kwargs)

Disorder cutoffs when creating sequence variants:

When making sequence variants, by default GOOSE will use the predicted disorder values of your input sequence as the threshold disorder values for the returned sequence. However, you can change this by setting strict_disorder=True, which will make GOOSE use the cutoff disorder value across the entire sequence.

Types of sequence variants

The variant() function supports multiple variant types, each with specific parameters and behaviors:

Shuffling methods:

  • 'shuffle_specific_regions' - Shuffle only specified regions

  • 'shuffle_except_specific_regions' - Shuffle all except specified regions

  • 'shuffle_specific_residues' - Shuffle only specific residue types

  • 'shuffle_except_specific_residues' - Shuffle all except specific residue types

  • 'weighted_shuffle_specific_residues' - Weighted shuffle of specific residues

  • 'targeted_reposition_specific_residues' - Reposition specific residues

Residue asymmetry methods:

  • 'change_residue_asymmetry' - Change residue asymmetry patterns

Property methods:

  • 'constant_properties' - Generate variant with constant properties (NCPR, FCR, hydropathy, and kappa)

  • 'constant_residues_and_properties' - Keep specified residues and properties constant. The sequence generated will have the same properties as the input sequence, but with specified residues kept constant.

  • 'constant_properties_and_class' - Generate variant with constant properties and the number of amino acids by each amino acid class

  • 'constant_properties_and_class_by_order' - Generate variant with constant properties and the number and order of amino acids by class constant

Property modification methods:

  • 'change_hydropathy_constant_class' - Change hydropathy while keeping class constant

  • 'change_fcr_minimize_class_changes' - Change FCR while minimizing changes to amino acid classes. Prioritizes keeping aromatics constant then H, C, and P, then aliphatics, then polar.

  • 'change_ncpr_constant_class' - Change NCPR while keeping class constant

  • 'change_kappa' - Change kappa value. Sequence composition stays constant.

  • 'change_properties_minimize_differences' - Change properties while minimizing differences. This function is a little bit slower because it tries to change the fewest residues possible to achieve the desired properties.

  • 'change_any_properties' - Change any combination of properties. Similar to change_properties_minimize_differences, but changes are not necessarily minimized.

  • 'change_dimensions' - Change sequence dimensions (Rg/Re). This allows changes in the sequence including the amino acids by class.

Common parameters

Most variant types support these common parameters:

  • num_attempts (int): Number of attempts to generate variant (default: 100)

  • strict_disorder (bool): Whether to use strict disorder checking (default: False)

  • disorder_cutoff (float): Disorder cutoff threshold (default: from parameters)

  • metapredict_version (int): MetaPredict version to use (default: 3)

  • hydropathy_tolerance (float): Hydropathy tolerance (default: from parameters) (only if hydropathy is a factor)

  • kappa_tolerance (float): Kappa tolerance (default: from parameters) (only if kappa is a factor)

For some variants, you can specify amino acids by class. The classes are categorized as follows:

  • aromatic: ‘F’, ‘W’, ‘Y’

  • polar: ‘Q’, ‘N’, ‘S’, ‘T’

  • positive: ‘K’, ‘R’

  • negative: ‘D’, ‘E’

  • hydrophobic: ‘I’, ‘V’, ‘L’, ‘A’, ‘M’

  • cystine: ‘C’

  • proline: ‘P’

  • glycine: ‘G’

  • histidine: ‘H’

The Special Cases residues are, for any function that accounts for the class of a residue, not interchangeable with any other residues.

Shuffling variants

Shuffle specific regions

The 'shuffle_specific_regions' variant type shuffles only specified regions of the sequence.

Parameters: - shuffle_regions (list): List of tuples specifying (start, end) positions to shuffle

Example:

test = 'QQQEEENNNDDDQQQEEENNNDDD'
variant_seq = create.variant(test, 'shuffle_specific_regions',
                            shuffle_regions=[(2, 9), (14, 22)])
print(variant_seq)
# Output: 'QQEEQENNNDDDQQNQNENEDEDD'

Note: Region specifications use 0-based indexing where (start, end) includes positions from start to end-1, following Python slice conventions.

Shuffle except specific regions

The 'shuffle_except_specific_regions' variant type shuffles all regions except those specified.

Parameters: - excluded_regions (list): List of tuples specifying (start, end) positions to exclude from shuffling

Example:

test = 'QQQEEENNNDDDQQQEEENNNDDD'
variant_seq = create.variant(test, 'shuffle_except_specific_regions',
                            excluded_regions=[(0, 5), (18, 24)])
print(variant_seq)
# Output: 'QQQEENQEDENQDENDEQNNNDDD'

Shuffle specific residues

The 'shuffle_specific_residues' variant type shuffles only specific residue types.

Parameters: - target_residues (list): List of residue types to shuffle

Example:

test = 'QQQEEENNNDDDQQQEEENNNDDD'
variant_seq = create.variant(test, 'shuffle_specific_residues',
                            target_residues=['N', 'D'])
print(variant_seq)
# Output: 'QQQEEENNNDDDQQQEEENNNDDD'

Shuffle except specific residues

The 'shuffle_except_specific_residues' variant type shuffles all residues except those specified.

Parameters: - excluded_residues (list): List of residue types to exclude from shuffling

Example:

test = 'QQQEEENNNDDDQQQEEENNNDDD'
variant_seq = create.variant(test, 'shuffle_except_specific_residues',
                            excluded_residues=['N', 'D'])
print(variant_seq)
# Output: 'QQQEEENNNDDDQQQEEENNNDDD'

Weighted shuffle specific residues

The 'weighted_shuffle_specific_residues' variant type performs weighted shuffling of specific residues.

Parameters: - target_residues (list): List of residue types to shuffle - shuffle_weight (float): Weight for shuffling operations (0.0 to 1.0)

Example:

test = 'QQQEEENNNDDDQQQEEENNNDDD'
variant_seq = create.variant(test, 'weighted_shuffle_specific_residues',
                            target_residues=['Q', 'E'],
                            shuffle_weight=0.5)
print(variant_seq)
# Output: 'QQQEEENNNDDDQQQEEENNNDDD'

Targeted reposition specific residues

The 'targeted_reposition_specific_residues' variant type repositions specific residues within the sequence.

Parameters: - target_residues (list): List of residue types to reposition

Example:

test = 'QQQEEENNNDDDQQQEEENNNDDD'
variant_seq = create.variant(test, 'targeted_reposition_specific_residues',
                            target_residues=['E'])
print(variant_seq)
# Output: 'QQQEEENNNDDDQQQEEENNNDDD'

Property-based variants

Constant properties

The 'constant_properties' variant type generates a variant where only the sequence properties are constrained.

Parameters: - exclude_residues (list, optional): List of residue types to exclude from the variant

Example:

test = 'QEQNGVDQQETTPRQDYPGNQQPNQQAEGQQMQ'
variant_seq = create.variant(test, 'constant_properties')
print(variant_seq)
# Output: 'QEQNGVDQQETTPRQDYPGNQQPNQQAEGQQMQ'

Constant residues and properties

The 'constant_residues_and_properties' variant type keeps specified residues constant while maintaining properties.

Parameters: - constant_residues (list): List of residue types to keep constant

Example:

test = 'QEQNGVDQQETTPRQDYPGNQQPNQQAEGQQMQ'
variant_seq = create.variant(test, 'constant_residues_and_properties',
                            constant_residues=['T', 'Q'])
print(variant_seq)
# Output: 'QDQSMNDQQETTGKQDNAGGQQHPQQPDAQQSQ'

Constant properties and class

The 'constant_properties_and_class' variant type generates a variant with the same properties and amino acid class distribution.

Example:

test = 'QEQNGVDQQETTPRQDYPGNQQPNQQAEGQQMQ'
variant_seq = create.variant(test, 'constant_properties_and_class')
print(variant_seq)
# Output: 'QENQGADQQDQNPRNEWPGNNNPNQTADGNSAT'

Constant properties and class by order

The 'constant_properties_and_class_by_order' variant type generates a variant with the same properties and maintains the order of amino acid classes.

Example:

test = 'QGENNENPQDQGSREGPQNNAWAQNNQDAQTSP'
variant_seq = create.variant(test, 'constant_properties_and_class_by_order')
print(variant_seq)
# Output: 'QGDNQDNPNEQGQRDGPNTSAYAQQNNELQNNP'

Property modification variants

Change hydropathy constant class

The 'change_hydropathy_constant_class' variant type changes hydropathy while keeping amino acid classes constant.

Parameters: - target_hydropathy (float): Target hydropathy value

Example:

test = 'GNGGNRAENRTERKGEQTHKSNHNDGARHTDRRRSHDKNAASRE'
variant_seq = create.variant(test, 'change_hydropathy_constant_class',
                            target_hydropathy=2.7)
print(variant_seq)
# Output: 'GTGGTKIETKTEKKGETTHKTTHTDGLKHTDRKKTHDKSVMTKE'

Note: Due to class constraints, there are limits to how much you can increase or decrease the hydropathy of any specific sequence. GOOSE will raise an error if you exceed these limits.

Change FCR minimize class changes

The 'change_fcr_minimize_class_changes' variant type adjusts FCR while minimizing changes to amino acid classes.

Parameters: - target_FCR (float): Target FCR value

Example:

test = 'TTGGATSQAGGATHAQSHANSGTQSTSSPQTQGVNTTSANGQHGQATNQS'
variant_seq = create.variant(test, 'change_fcr_minimize_class_changes',
                            target_FCR=0.2)
print(variant_seq)
# Output: 'TTGGMTSDAGGATHMKSHANSKGTKSTSSPKTEGINTTTIDGDHGKMTDKT'

Change NCPR constant class

The 'change_ncpr_constant_class' variant type adjusts NCPR while keeping amino acid classes constant.

Parameters: - target_NCPR (float): Target NCPR value

Example:

test = 'GNGGNRAENRTERKGEQTHKSNHNDGARHTDRRRSHDKNAASRE'
variant_seq = create.variant(test, 'change_ncpr_constant_class',
                            target_NCPR=0.0)
print(variant_seq)
# Output: 'GNGGNRAENRTEEKGEQTHKSNHNDGARHTDDRRSHDKNAASRE'

Change kappa

The 'change_kappa' variant type alters charge asymmetry by changing the kappa value.

Parameters: - target_kappa (float): Target kappa value (0.0 to 1.0)

Example:

test = 'QNEKRDQNEKRDQNEKRDQNEKRDQNEKRDQN'
variant_seq = create.variant(test, 'change_kappa', target_kappa=0.9)
print(variant_seq)
# Output: 'KQRKRKRKRKRNQNQNQNQNEDEDQNEDEDED'

Note: GOOSE allows deviation from your input kappa value by up to 0.03 to maintain performance. Higher kappa values increase charge asymmetry, lower values reduce it.

Change any properties

The 'change_any_properties' variant type adjusts multiple properties simultaneously.

Parameters: - target_FCR (float): Target FCR value - target_NCPR (float): Target NCPR value - target_kappa (float): Target kappa value - target_hydropathy (float): Target hydropathy value

Example:

test = 'GNGGNRAENRTERKGEQTHKSNHNDGARHTDRRRSHDKNAASRE'
variant_seq = create.variant(test, 'change_any_properties',
                            target_hydropathy=2.5,
                            target_FCR=0.23,
                            target_NCPR=0.0,
                            target_kappa=0.1)
print(variant_seq)
# Output: 'GNGGQNAEQRNTKEGNESHTSTHTGDRAHQKSNNHQTNLERVSN'

Change properties minimize differences

The 'change_properties_minimize_differences' variant type changes properties while minimizing differences from the original.

Parameters (all optional): - target_hydropathy (float): Target hydropathy value - target_FCR (float): Target FCR value - target_NCPR (float): Target NCPR value - target_kappa (float): Target kappa value

Example:

test = 'GNGGNRAENRTERKGEQTHKSNHNDGARHTDRRRSHDKNAASRE'
variant_seq = create.variant(test, 'change_properties_minimize_differences',
                            target_kappa=0.3,
                            target_hydropathy=2.6)
print(variant_seq)
# Output: 'KTGGTKRGSKTARKGKSTHTTKHDEGVRTHDRRLSHEENADSTE'

Asymmetry variants

Change residue asymmetry

The 'change_residue_asymmetry' variant type changes the asymmetry of specific residues without changing sequence composition.

Parameters: - target_residues (list): List of residue types or classes to modify - num_changes (int, optional): Number of changes to make - increase_or_decrease (str, optional): Whether to ‘increase’ or ‘decrease’ asymmetry

Example - decreasing polar residue asymmetry:

test = 'NSQSSQDSQDKSQGSQNQQEQSDSSEQTKQEEDGQTSSDSREQSQSHSQQ'
variant_seq = create.variant(test, 'change_residue_asymmetry',
                            target_residues=['polar'],
                            increase_or_decrease='decrease',
                            num_changes=5)
print(variant_seq)
# Output: 'NSQDSSDQSQKSQGSQENQDQEKQSESSEQDGTQDQTSRSSEQSQSHSQQ'

Example - increasing asymmetry with custom residue list:

test = 'RGNNLAGIVLGAAGAMNGRTEGRKGEQTHGKSGNDDRGHTGDRSHGNKNRGE'
variant_seq = create.variant(test, 'change_residue_asymmetry',
                            target_residues=['G', 'T'],
                            increase_or_decrease='increase',
                            num_changes=20)
print(variant_seq)
# Output: GGGGGTGGTGGGTGGGRNNLAIVLAAAMNRERKEQHKSNDDRHDRSHNKNRE

Dimensional variants

Change dimensions

The 'change_dimensions' variant type adjusts sequence dimensions (Rg or Re) while keeping amino acid composition constant.

Parameters: - increase_or_decrease (str): Whether to ‘increase’ or ‘decrease’ the dimension - rg_or_re (str): Whether to optimize ‘rg’ or ‘re’ - num_dim_attempts (int, optional): Number of dimensional optimization attempts - allowed_error (float, optional): Allowed error for dimensional constraints - reduce_pos_charged (bool, optional): Whether to reduce positive charges - exclude_aas (list, optional): Amino acids to exclude from generation

Example - increasing Re:

test = 'FYFLGQGQQYYYYQQKQFFQFYYQQFFGFYGSNFQGGNYFGGYQQNQYFG'
variant_seq = create.variant(test, 'change_dimensions',
                            increase_or_decrease='increase',
                            rg_or_re='re')
print(variant_seq)

Example - decreasing Rg:

test = 'FYFLGQGQQYYYYQQKQFFQFYYQQFFGFYGSNFQGGNYFGGYQQNQYFG'
variant_seq = create.variant(test, 'change_dimensions',
                            increase_or_decrease='decrease',
                            rg_or_re='rg')
print(variant_seq)

Error handling and troubleshooting

The variant() function provides comprehensive error handling:

Common errors:

  1. Invalid variant type: Ensure the variant_type is one of the supported types listed above.

  2. Missing required parameters: Each variant type has specific required parameters.

  3. Invalid parameter values: Check that parameter values are within valid ranges.

  4. Variant generation failure: If generation fails, try increasing num_attempts or adjusting target values.

Example error handling:

try:
    variant_seq = create.variant(sequence, 'change_kappa', target_kappa=0.5)
except goose.goose_exceptions.GooseInputError as e:
    print(f"Input error: {e}")
except goose.goose_exceptions.GooseFail as e:
    print(f"Generation failed: {e}")

Tips for successful variant generation:

  • Start with moderate changes to properties

  • Use higher num_attempts for difficult targets

  • Check that your sequence has the necessary residue types for the variant

  • For kappa variants, ensure your sequence has both positive and negative charges

  • For class-based variants, remember that some property changes may not be possible due to class constraints

Function selection guide

Choose variant type based on your needs:

  • Shuffling sequences: Use shuffling variants to rearrange existing residues

  • Maintaining properties: Use constant property variants to keep sequence characteristics

  • Changing specific properties: Use property modification variants for targeted changes

  • Adjusting dimensions: Use dimensional variants to change IDR dimensions

  • Changing asymmetry: Use asymmetry variants to modify residue distribution patterns

Performance considerations:

  • Shuffling variants are generally fastest

  • Property modification variants may require more attempts

  • Dimensional variants can be computationally intensive

  • Kappa variants work best with values between 0.1 and 0.9

Backward compatibility notes

The unified variant() function replaces many individual functions from previous versions:

  • constant_class_var()variant(seq, 'constant_properties_and_class')

  • constant_properties_var()variant(seq, 'constant_properties')

  • region_shuffle_var()variant(seq, 'shuffle_specific_regions')

  • targeted_shuffle_var()variant(seq, 'shuffle_specific_residues')

  • excluded_shuffle_var()variant(seq, 'shuffle_except_specific_residues')

  • kappa_var()variant(seq, 'change_kappa')

  • hydro_class_var()variant(seq, 'change_hydropathy_constant_class')

  • fcr_class_var()variant(seq, 'change_fcr_minimize_class_changes')

  • ncpr_class_var()variant(seq, 'change_ncpr_constant_class')

  • all_props_class_var()variant(seq, 'change_any_properties')

  • re_var() / rg_var()variant(seq, 'change_dimensions')

  • weighted_shuffle_var()variant(seq, 'weighted_shuffle_specific_residues')

  • asymmetry_var()variant(seq, 'change_residue_asymmetry')

The new interface provides more consistent parameter names and improved error handling while maintaining all the functionality of the original functions.