You are viewing an historical archive of past issues. Please report new issues to the appropriate project issue tracker on GitHub.

Home » Issues » Bug #1689

Bug #1689: Lines with a single space and no subsequent characters should not parse to <pre> element

Kind	bug
Product	wikitext
When	Created 2010-09-12T15:32:12Z, updated 2010-09-14T07:28:18Z
Status	open
Reporter	anonymous
Tags	no tags

Description

Sorry for the lame bug report, but I have fairly non-technical users of my wiki, and this issue is very frustrating for them.

What happens: If I put a space character in a line with no subsequent text (most often comes up when I copy and past from an html document), your wikitext library parses it to include a "pre" tag.

What I expect: if the line has no printable/visible characters, it should render as a blank line; probably as a <br /> element, based on the behavior of the mediawiki parser.A

Comments

Greg Hurrell 2010-09-12T17:56:31Z

If you check out the list of design goals you'll see that one of the goals was to be "informative: when provided invalid markup the translator should fail gracefully and emit HTML that provides useful visual feedback about where the errors are in the input".

So that's why it behaves the way it does; if you've got a stray space in your markup then you almost certainly inserted it by mistake, so the presentation of it as a <pre></pre> span brings your attention to an error that you otherwise would've overlooked.

Having said that, this strict behavior is probably appealing to me because I'm a programmer; I can see why this might be puzzling for non-technical users, so I'll have a think about changing it.

P.S. Your bug report isn't "lame".
anonymous 2010-09-14T01:55:42Z

All fair points. In defense of the bug, design goal five of easy to use identifies the importance of being "as close as possible" to mediawiki.

To the point of being informative: it's not exactly a question of invalid markup and/or graceful failure, because it's valid in the mediawiki reference implementation. The only question is whether it's a valid "pre" tag or a valid </p> tag, right?

One other thing that occurred to me: it will fail silently if the page's css doesn't have visible formatting for a "pre" element (ie - no border, no background, etc), so you're basing your putative benefits on assumed page behavior.

Having said that, It's an easy change for me to make as a work around downstream, too. It just comes at the cost of my preferred "pre" formatting when it's deliberate :)
Greg Hurrell 2010-09-14T07:28:18Z
Yep, you're right that one of the other goals is being "as close as possible" to MediaWiki syntax. But like all things it has to be balanced ie. it is not "as close as possible to MediaWiki, at any cost".

I remember when I first implemented the parser (back in 2007) I minutely studyied the output of the MediaWiki parser using dozens of permutations of inupt, and copyied its behavior where it made sense to do so. I explicitly avoided replicating things which I considered to be buggy or inconsistent behavior in MediaWiki (and there are lots of strange edge cases in the MediaWiki parser which are products of its implementation details rather than intentional design).

So what I'm saying here is that the different design goals are not always going to point you in the same direction, and you sometimes have to make trade-offs. Being close to MediaWiki syntax is one goal, but these things are also goals:
- providing visible feedback in the face of mangled input
- simplicity and consistency of rules (ie. once you understand the basic rules of the translator you should be able to predict what it is going to emit for any given output; you should not have to be aware of exceptions or have to think about things like "if X then Y, but if Z then X" and so on... an example of a simple rule here is "a line beginning with a space is a <pre></pre> span)
- feasibility and simplicity of implementation (ie. simple, consistent rules are the easiest ones to implement, so any place where things are complicated — things like nested lists, for example — are going to add to the complexity of the translator and bring with them the risk of bugs; this means that whenever you talk about adding complexity or exceptions to the translator you have to ask yourself if it is worth the cost)
At this point, given that the base syntax of the Wikitext extension is "close enough" to MediaWiki syntax to (generally) be non-surprising to people who use it, when it comes to edge cases like this one (a non-empty line containing only whitespace) merely because MediaWiki does something in that particular edge case is not very compelling (ie. I am more interested in being faithful to the general behavior of MediaWiki rather than copying mere edge cases).

I am more swayed by the simple argument that it should be "easy to use" or "non-surprising" for non-technical users (not because of the comparison with MediaWiki, but simply because a stray space can certainly creep in easily enough as you point out, like when copying-and-pasting).

The only things which concerns me are:
- What kind of behavior do you expect when we start seeing multiple non-empty lines filled only with whitespace? or whitespace-only non-empty lines which are adjacent to "real" <pre></pre> spans eg:
```
normal para

 <--- whitespace
 <--- more whitespace
 <--- more whitespace
 line 1
 line 2
 line 3

another normal para
```
So here what are you expecting to be done with those leading "blank" lines? Currently, they would appear as part of the <pre></pre> span. Should they be deemed a mistake and silently eaten? Should they be converted to <br /> tags? It's fairly obvious that the line 1, line 2, line 3 part of the input is supposed to be a <pre></pre> span — and if it's not supposed to be that then the user should get some visible feedback about the problem in their markup — but it's not clear whether those leading lines are intentional or mistaken.

I'm inclined to think that:
- such lines are most likely to be a mistake, so should be left in the output so that the user can see the mistake
- even if they're not a mistake, then they should be left in the output; there is nothing more infuriating than some dumb program trying "correct" my "mistakes" when they are actually intentional, and even more so when there is no way of escaping the input in some way to force it to do what I want (although in this case I suppose I could use explicit <pre></pre> tags instead of using the leading-space syntax)
I am not much of a copy-and-paster myself when it comes to writing using wikitext markup, but I do occasionally make a mistake like this:
```
para 1 blah blah blah
 <-- typing fast, so accidentally insert a space here too
para 2
```
It's fairly obvious that this is not intentional, and could be automatically corrected for the user. My doubt is about where to draw the line. If there are multiple such lines, should they be corrected? Or translated to <br /> tags? If there is lots of whitespace on the line should it be corrected?

From the implementation perspective this is a little complicated, although not impossible, because it requires the translator to buffer the input tokens as it scans multiple lines, waiting to see if anything meaningful eventually gets scanned, and only at the end can it decide whether to emit all the buffered markup as a <pre></pre> tag or something else. Things are definitely simpler when the translator can mark things up a line at a time (or better still, a token at a time).

Anyway, sorry for the lengthy response. It's just that this is a much more complicated issue than it might seem at first (compare that with ticket #1690 which you opened, which is actually a much clearer case).

Add a comment

Comments are now closed for this issue.

Bug #1689: Lines with a single space and no subsequent characters should not parse to <pre> element

Description

Comments

Add a comment

Menu