Very Secure

Proper HTML Linking, A Battlefield Report

Having been rightfully flamed for attempting to use a tool I did not understand, I return from the dark hell of the html & php mines with a tiny nugget of information that I hope will aid the republic.

There are two quirks I've noticed with the select displayer.

1. The select displayer can match the values provided in query parameters b/e to text inside of an html tag. This problem emerges from user error, but often one wants to match to text in a link that contains equivalent text in its opening anchor tag. It is currently impossible to select the second "trilema" that follows the "http://trilema.com" in the example:1

<a href="http://trilema.com">trilema</a>

My solution is to find the first match not positioned inside of a tag.2

function first_pos_not_in_tag($hay, $needle, $start) {
  $max_attempts = 2;
  $guess = $start; // Must be > 0 for the while loop condition.
  $length = strlen($hay);
  while ($max_attempts > 0 && $guess && $guess < $length) {
    $guess = strpos($hay, $needle, $guess);
    $next_close_pos = strpos($hay, ">", $guess);
    $next_open_pos = strpos($hay, "<", $guess);
    if ($next_close_pos >= $next_open_pos)
      return $guess;
    $guess = $next_close_pos+1;
    $max_attempts--;
  }
  return false;
}

You must alter your server_side_selection function

--- $b_pos = strpos($content,$_GET["b"]);
--- $e_pos = strpos($content,$_GET["e"], $b_pos);
+++ $b_pos = first_pos_not_in_tag($content, $_GET["b"], 1);
+++ $e_pos = first_pos_not_in_tag($content, $_GET["e"], $b_pos);

2. The second quirk is the select displayer often spits out faulty html. For example, the displayer provides no closing </span> if the user leaves the value for e empty.3 This doesn't seem to cause any practical issues; browsers close spans automatically under certain conditions that I have not fully ascertained.

  1. The root of the problem is the displayer does not provide a means to match to the second occurrence of text; there is no way to select only the last duck in duckduckduck. []
  2. This doesn't fix the root problem stated above, but it prevents the select displayer from breaking tags. This is especially useful for stopping other servers' automatically provided b & e values - used to link back to your excerpt when you send them a pingback - from mangling your html tags. []
  3. This may come as a surprise, because some browsers (I've seen chrome) will silently provide a closing </span> where they see fit and will show that inserted </span> in their "view source" tool! []

2 Responses to “Proper HTML Linking, A Battlefield Report”

  1. > It is currently impossible to select the second "trilema" that follows the "http://trilema.com" in the example:

    It should be impossible ; don't link a.com as a, nude like that, it's stupid. There must be come relevant context, or else why are you linking ?

    In other words, just because some user errors are more entrenched in common [lazy] practice than others dun make them less erroneous. Allow well made tools to guide your work experience much like you allow your work experience guide the making of wel lmade tools.

    But yes, it's good practice to always provide the closing e ; even if it's just a .

  2. whaack says:

    @Mircea Popescu

    I put that as a contrived example, but I understand it is on the user of the tool to understand how to select values without matching to the markup. As I say in footnote 2 there is a chance that the link generated for the pingbacks matches with markup. Not sure if it is worth having this extra complexity to handle that rare case.

    ---

    Unrelated: The mpwp text editor should not be evaluating htmlentities. I wrote <a href="http://trilema.com">trilema</a> as &lt;a href="http://trilema.com"&gt;trilema&lt;/a&gt; into the text editor. Whenever I save or update, the escaped html gets evaluated back to <a href="http://trilema.com">trilema</a>. If I save again, it gets saved to the db as an actual html tag. billymg seems to have found a solution to this.

Leave a Reply