langspace.probe.disentanglement package

Module contents

class langspace.probe.disentanglement.DisentanglementProbe(model: LangVAE, data: Iterable[Sentence], sample_size: int, metrics: List[DisentanglementMetric], gen_factors: dict, annotations: Dict[str, List[str]] = None, batch_size: int = 100)[source]

Bases: LatentSpaceProbe

A probe for disentanglement metrics on the latent space of a language VAE.

beta_vae_metric(batch_size=64, sample_number=50)[source]
static categorical_crossentropy_loss(y_pred, y_true)[source]
disentanglement_completeness_informativeness(sample_number=10000)[source]
static entropy(p: Tensor)[source]
factor_vae_metric(batch_size=64, sample_number=1000)[source]
group_sampling(generative_factor, value, batch_size) Tensor[source]
modularity_explicitness(num_bins=20, sample_number=10000)[source]
mutual_information_estimation(num_bins, sample_number, normalize=False)[source]
mutual_information_gap(num_bins=20, sample_number=10000)[source]
report() DataFrame[source]

Generate a report from the probe.

Returns:

The generated report.

Return type:

DataFrame

separated_attribute_predictability(sample_number=10000)[source]
stratified_sampling(generative_factor, sample_number)[source]
class langspace.probe.disentanglement.GenerativeDataset[source]

Bases: object

A base dataset class for capturing the generative factors and corresponding representations from a collection of sentences or samples.

generative_factors

A list to hold the names of generative factors.

Type:

List[Any]

value_space

For each generative factor, its associated value range or the unique set of factor values observed.

Type:

List[List[Any]]

sample_space

For each generative factor and each value in its value_space, this holds the list of sentence indices (or sample indices) corresponding to that value.

Type:

List[List[List[int]]]

representation_space

A list to store extracted latent representations of sentences, organized based on the sample_space.

Type:

List[Any]

get_representation_space(representations)[source]

Populate the representation_space based on the sample_space and provided latent representations.

For each generative factor group in sample_space, the method iterates over every unique value and extracts the corresponding representation (row) from the given representations (e.g., a 2D tensor or array). The result is stored in the representation_space, preserving the structure of the sample_space.

Parameters:
  • representations (Tensor or np.ndarray) – A 2D container of latent representations where each row

  • sample. (corresponds to a sentence or) –

class langspace.probe.disentanglement.SRLFactorDataset(data, gen_factors)[source]

Bases: GenerativeDataset

A GenerativeDataset for organizing sentences based on Semantic Role Labeling (SRL) generative factors.

This dataset processes a collection of sentence data along with corresponding semantic role annotations to extract and organize generative factors. It groups sentences by unique role patterns for each generative factor and records both the unique patterns (value_space) and the corresponding sentence indices (sample_space).

Parameters:
  • data (Iterable) –

    A collection of sentence data where each element is a tuple. The first element is the sentence, and the second element is a list of semantic role labels. .. rubric:: Example

    [

    (“The cat chased the mouse.”, [“arg0”, “v”, “arg1”]), (“Dogs bark loudly.”, [“arg0”, “v”]), …

    ]

  • gen_factors (Dict[str, List[Any]]) –

    A dictionary mapping generative factor names to lists of expected role values. For example:

    {“agent”: [“arg0”], “patient”: [“arg1”]}

generative_factors

List of generative factor keys extracted from gen_factors.

Type:

List[str]

value_space

For each generative factor, contains the unique role patterns encountered in the data.

Type:

List[List[Any]]

sample_space

For each generative factor and each unique role pattern, stores the indices of sentences that match that pattern.

Type:

List[List[List[int]]]

structure

A list capturing, for each sentence, the generative factor structure derived from its semantic role labels.

Type:

List[Any]