langspace.probe.disentanglement package
Module contents
- class langspace.probe.disentanglement.DisentanglementProbe(model: LangVAE, data: Iterable[Sentence], sample_size: int, metrics: List[DisentanglementMetric], gen_factors: dict, annotations: Dict[str, List[str]] = None, batch_size: int = 100)[source]
Bases:
LatentSpaceProbeA probe for disentanglement metrics on the latent space of a language VAE.
- class langspace.probe.disentanglement.GenerativeDataset[source]
Bases:
objectA base dataset class for capturing the generative factors and corresponding representations from a collection of sentences or samples.
- generative_factors
A list to hold the names of generative factors.
- Type:
List[Any]
- value_space
For each generative factor, its associated value range or the unique set of factor values observed.
- Type:
List[List[Any]]
- sample_space
For each generative factor and each value in its value_space, this holds the list of sentence indices (or sample indices) corresponding to that value.
- Type:
List[List[List[int]]]
- representation_space
A list to store extracted latent representations of sentences, organized based on the sample_space.
- Type:
List[Any]
- get_representation_space(representations)[source]
Populate the representation_space based on the sample_space and provided latent representations.
For each generative factor group in sample_space, the method iterates over every unique value and extracts the corresponding representation (row) from the given representations (e.g., a 2D tensor or array). The result is stored in the representation_space, preserving the structure of the sample_space.
- Parameters:
representations (Tensor or np.ndarray) – A 2D container of latent representations where each row
sample. (corresponds to a sentence or) –
- class langspace.probe.disentanglement.SRLFactorDataset(data, gen_factors)[source]
Bases:
GenerativeDatasetA GenerativeDataset for organizing sentences based on Semantic Role Labeling (SRL) generative factors.
This dataset processes a collection of sentence data along with corresponding semantic role annotations to extract and organize generative factors. It groups sentences by unique role patterns for each generative factor and records both the unique patterns (value_space) and the corresponding sentence indices (sample_space).
- Parameters:
data (Iterable) –
A collection of sentence data where each element is a tuple. The first element is the sentence, and the second element is a list of semantic role labels. .. rubric:: Example
- [
(“The cat chased the mouse.”, [“arg0”, “v”, “arg1”]), (“Dogs bark loudly.”, [“arg0”, “v”]), …
]
gen_factors (Dict[str, List[Any]]) –
A dictionary mapping generative factor names to lists of expected role values. For example:
{“agent”: [“arg0”], “patient”: [“arg1”]}
- value_space
For each generative factor, contains the unique role patterns encountered in the data.
- Type:
List[List[Any]]
- sample_space
For each generative factor and each unique role pattern, stores the indices of sentences that match that pattern.
- Type:
List[List[List[int]]]
- structure
A list capturing, for each sentence, the generative factor structure derived from its semantic role labels.
- Type:
List[Any]