langspace.ops package

Submodules

langspace.ops.arithmetic module

class langspace.ops.arithmetic.ArithmeticOps(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Vector arithmetic operations

AVG = 'AVG'
SUB = 'SUB'
SUM = 'SUM'

langspace.ops.interpolation module

class langspace.ops.interpolation.InterpolationOps[source]

Bases: object

Operations for obtaining and evaluating interpolations from a source (start) and target (end) representation vectors.

The linear interpolation method helps in visualizing transitions in latent spaces, which is common in the study of generative models. The text methods leverage the Word Mover’s Distance (WMD) to evaluate how uniformly transitions occur in semantic space, with WMD originally proposed to capture semantic dissimilarity between texts.

static interpolation_smoothness(interpolate_path: List[str], model_wmd: KeyedVectors, stop_words: List[str]) float[source]

Calculates the smoothness of an interpolated path between sentences based on Word Mover’s Distance.

This method computes a smoothness score for a sequence of sentences that represent a semantic interpolation path. The overall semantic distance (d_origin) is measured between the first and the last sentence of the path. Additionally, the cumulative distance between consecutive sentence pairs is computed. The smoothness score is then defined as the ratio of the overall distance to the sum of local distances. A score closer to 1 indicates that the transition between each adjacent pair of sentences is uniformly distributed, suggesting a smooth semantic change.

Parameters:
  • interpolate_path (List[str]) – A list of sentences forming the interpolation path.

  • model_wmd (KeyedVectors) – The word embedding model used to compute the Word Mover’s Distance.

  • stop_words (List[str]) – A list of stop words to be removed during the preprocessing of sentences.

Returns:

The computed smoothness score of the interpolation path. Values closer to 1 imply smoother transitions.

Return type:

float

Evaluating interpolation smoothness using word-level transport distances leverages ideas from metric learning in text representations .

static linearize_interpolate(source: Tensor, target: Tensor, size: int = 10) List[Tensor][source]

Performs linear interpolation between two representation vectors.

This method generates a sequence of vectors transitioning from the source to the target by computing the weighted average of the two. The interpolation is performed in equal increments, with the weights for the source vector decreasing from 1 to 0 and those for the target vector increasing from 0 to 1 over the specified number of steps.

Parameters:
  • source (Tensor) – The starting representation vector.

  • target (Tensor) – The ending representation vector.

  • size (int, optional) – The number of interpolation steps between the source and target. Default is 10.

Returns:

A list of interpolated vectors, including both the source and target, ordered sequentially.

static preprocess(sentence: str, stop_words: List[str]) List[str][source]

Normalizes and tokenizes a sentence by lowercasing and removing stop words.

This method splits the sentence into words after converting it to lowercase and filters out any words that are present in the provided stop words list.

Parameters:
  • sentence (str) – The input sentence to preprocess.

  • stop_words (List[str]) – A list of stop words to exclude from the tokenized output.

Returns:

A list of processed words with stop words removed.

static word_mover_distance(sent1: str, sent2: str, model: KeyedVectors, stopword: List[str]) float[source]

Calculates the Word Mover’s Distance (WMD) between two sentences.

This method first preprocesses the input sentences to remove stop words and normalize the text, and then computes the WMD between them using the provided word embedding model. WMD reflects the minimum cumulative distance required to ‘move’ the embeddings of words in one sentence to match those of the other sentence, thereby capturing semantic dissimilarities.

Parameters:
  • sent1 (str) – The first sentence.

  • sent2 (str) – The second sentence.

  • model (KeyedVectors) – A word embedding model that supports computing the WMD.

  • stopword (List[str]) – A list of stop words to remove during preprocessing.

Returns:

The computed Word Mover’s Distance representing the semantic difference between the two sentences.

langspace.ops.traversal module

class langspace.ops.traversal.TraversalOps[source]

Bases: object

Class for performing latent space traversal and distance computations.

This class provides methods that enable controlled exploration of generative latent spaces.

static calculate_distance(seed: Tensor, samples: Tensor) Tensor[source]

Computes the Euclidean distance between a seed latent vector and a set of sample latent vectors.

This method calculates the L2 norm, i.e., the square root of the sum of squared differences, for each sample in relation to the original seed vector.

Parameters:
  • seed (Tensor) – The original latent vector tensor (shape: [batch_size, latent_dim]).

  • samples (Tensor) – A tensor of sample latent vectors (shape: [batch_size, sample_size, latent_dim]).

Returns:

A tensor containing the Euclidean distances (L2 norms) with shape [batch_size, sample_size].

static dimension_random_walk(mu: Tensor, logvar: Tensor, latent: Tensor, dim: int, sample_size: int) Tensor[source]

Performs a random walk along a specified latent dimension.

This method generates a sequence of latent vectors by replacing the values at a target dimension with samples drawn from the inverse cumulative density function (percent point function) of the normal distribution parameterized by the corresponding mean and log variance. The sampled values span a continuum of percentiles from 0.001 to 0.999, thereby exploring the latent space along that specific axis.

The controlled perturbation along one dimension helps reveal how changes in a single latent feature influence the generated output, a key aspect in the analysis of disentangled representations.

Parameters:
  • mu (Tensor) – Mean tensor for the latent distribution (shape: [batch_size, latent_dim]).

  • logvar (Tensor) – Log variance tensor for the latent distribution (shape: [batch_size, latent_dim]).

  • latent (Tensor) – Original latent vector tensor (shape: [batch_size, latent_dim]).

  • dim (int) – Target latent dimension index along which to perform the traversal.

  • sample_size (int) – Number of samples (or steps) to generate along the traversal.

Returns:

A tensor containing traversed latent vectors with shape [batch_size, sample_size, latent_dim].

static traverse(mu: Tensor, std: Tensor, latent: Tensor, dim: int, sample_size: int) Tuple[Tensor, Tensor][source]

Integrates latent space traversal and distance computation on a specified latent dimension.

This high-level method performs two main operations:
  1. It generates a series of latent vectors by executing a random walk along the given dimension, using the provided mean and (log-)variance parameters.

  2. It computes the Euclidean distances between the original latent vector and each of the traversed samples, quantifying the extent of perturbation.

Parameters:
  • mu (Tensor) – Mean tensor for the latent distribution (shape: [batch_size, latent_dim]).

  • std (Tensor) – Standard deviation or log variance used as a surrogate for standard deviation (shape: [batch_size, latent_dim]).

  • latent (Tensor) – Original latent vector tensor (shape: [batch_size, latent_dim]).

  • dim (int) – Target latent dimension to manipulate.

  • sample_size (int) – Number of traversal samples to generate.

Returns:

  • A tensor containing the traversed latent vectors (shape: [batch_size, sample_size, latent_dim]).

  • A tensor with the Euclidean distances (L2 norms) of each traversal from the original latent vector (shape: [batch_size, sample_size]).

Return type:

Tuple[Tensor, Tensor]

Module contents