Class Diff_SequenceMatcher

InheritanceDiff_SequenceMatcher

Sequence matcher for Diff

PHP version 5

Copyright (c) 2009 Chris Boulton chris.boulton@interspire.com

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the Chris Boulton nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Public Methods

Hide inherited methods

MethodDescriptionDefined By
Ratio() Return a measure of the similarity between the two sequences. Diff_SequenceMatcher
__construct() The constructor. With the sequences being passed, they'll be set for the sequence matcher and it will perform a basic cleanup & calculate junk elements. Diff_SequenceMatcher
findLongestMatch() Find the longest matching block in the two sequences, as defined by the lower and upper constraints for each sequence. (for the first sequence, $alo - $ahi and for the second sequence, $blo - $bhi) Diff_SequenceMatcher
getGroupedOpcodes() Return a series of nested arrays containing different groups of generated opcodes for the differences between the strings with up to $context lines of surrounding content. Diff_SequenceMatcher
getMatchingBlocks() Return a nested set of arrays for all of the matching sub-sequences in the strings $a and $b. Diff_SequenceMatcher
getOpCodes() Return a list of all of the opcodes for the differences between the two strings. Diff_SequenceMatcher
linesAreDifferent() Check if the two lines at the given indexes are different or not. Diff_SequenceMatcher
setOptions() Set new options Diff_SequenceMatcher
setSeq1() Set the first sequence ($a) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them. Diff_SequenceMatcher
setSeq2() Set the second sequence ($b) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them. Diff_SequenceMatcher
setSequences() Set the first and second sequences to use with the sequence matcher. Diff_SequenceMatcher

Method Details

Ratio() public method

Return a measure of the similarity between the two sequences.

This will be a float value between 0 and 1.

Out of all of the ratio calculation functions, this is the most expensive to call if getMatchingBlocks or getOpCodes is yet to be called. The other calculation methods (quickRatio and realquickRatio) can be used to perform quicker calculations but may be less accurate.

The ratio is calculated as (2 * number of matches) / total number of elements in both sequences.

public float Ratio ( )
return float

The calculated ratio.

__construct() public method

The constructor. With the sequences being passed, they'll be set for the sequence matcher and it will perform a basic cleanup & calculate junk elements.

public void __construct ( $a, $b, $junkCallback null, $options )
$a string|array

A string or array containing the lines to compare against.

$b string|array

A string or array containing the lines to compare.

$junkCallback string|array

Either an array or string that references a callback function (if there is one) to determine 'junk' characters.

$options array
findLongestMatch() public method

Find the longest matching block in the two sequences, as defined by the lower and upper constraints for each sequence. (for the first sequence, $alo - $ahi and for the second sequence, $blo - $bhi)

Essentially, of all of the maximal matching blocks, return the one that startest earliest in $a, and all of those maximal matching blocks that start earliest in $a, return the one that starts earliest in $b.

If the junk callback is defined, do the above but with the restriction that the junk element appears in the block. Extend it as far as possible by matching only junk elements in both $a and $b.

public array findLongestMatch ( $alo, $ahi, $blo, $bhi )
$alo int

The lower constraint for the first sequence.

$ahi int

The upper constraint for the first sequence.

$blo int

The lower constraint for the second sequence.

$bhi int

The upper constraint for the second sequence.

return array

Array containing the longest match that includes the starting position in $a, start in $b and the length/size.

getGroupedOpcodes() public method

Return a series of nested arrays containing different groups of generated opcodes for the differences between the strings with up to $context lines of surrounding content.

Essentially what happens here is any big equal blocks of strings are stripped out, the smaller subsets of changes are then arranged in to their groups. This means that the sequence matcher and diffs do not need to include the full content of the different files but can still provide context as to where the changes are.

public array getGroupedOpcodes ( $context 3 )
$context int

The number of lines of context to provide around the groups.

return array

Nested array of all of the grouped opcodes.

getMatchingBlocks() public method

Return a nested set of arrays for all of the matching sub-sequences in the strings $a and $b.

Each block contains the lower constraint of the block in $a, the lower constraint of the block in $b and finally the number of lines that the block continues for.

public array getMatchingBlocks ( )
return array

Nested array of the matching blocks, as described by the function.

getOpCodes() public method

Return a list of all of the opcodes for the differences between the two strings.

The nested array returned contains an array describing the opcode which includes: 0 - The type of tag (as described below) for the opcode. 1 - The beginning line in the first sequence. 2 - The end line in the first sequence. 3 - The beginning line in the second sequence. 4 - The end line in the second sequence.

The different types of tags include: replace - The string from $i1 to $i2 in $a should be replaced by

      the string in $b from $j1 to $j2.

delete - The string in $a from $i1 to $j2 should be deleted. insert - The string in $b from $j1 to $j2 should be inserted at

      $i1 in $a.

equal - The two strings with the specified ranges are equal.

public array getOpCodes ( )
return array

Array of the opcodes describing the differences between the strings.

linesAreDifferent() public method

Check if the two lines at the given indexes are different or not.

public boolean linesAreDifferent ( $aIndex, $bIndex )
$aIndex int

Line number to check against in a.

$bIndex int

Line number to check against in b.

return boolean

True if the lines are different and false if not.

setOptions() public method

Set new options

public void setOptions ( $options )
$options array
setSeq1() public method

Set the first sequence ($a) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.

public void setSeq1 ( $a )
$a string|array

The sequence to set as the first sequence.

setSeq2() public method

Set the second sequence ($b) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.

public void setSeq2 ( $b )
$b string|array

The sequence to set as the second sequence.

setSequences() public method

Set the first and second sequences to use with the sequence matcher.

public void setSequences ( $a, $b )
$a string|array

A string or array containing the lines to compare against.

$b string|array

A string or array containing the lines to compare.