It appears that HTML entities can cause RSS/syndication readers to fail when trying to read WordPress comment RSS feeds. Fortunately, a plugin has been written to resolve the issue. Entity 2NCR has a confusing name, but has a purpose that is easy to understand which is to convert various HTML Character Entities to their numeric equivalents.

HTML Explained

The Hypertext Markup Language (HTML) is a simple markup language used to create hypertext documents that are platform independent. HTML documents are SGML documents with generic semantics appropriate for representing information from a wide range of domains. It can represent hypertext news, mail, documentation and hypermedia as well as menus of options, database query results and simple structured documents with in-line graphics. It can likewise represent hypertext views of existing bodies of information.

The World Wide Web (WWW) has been using HTML since 1990, making it one if the most widely used computer languages in the world. The WWW, in turn, is most commonly used for HTML whose popularity is due to the fact that it is the coding technology used to publish content on the Internet or the web. Programmers were quick to recognize HTML's user friendliness due to the ease of learning it.

This ease of coding was significantly contributory to the proliferation of web sites. However, HTML is not a complete programming language because it lacks conditional tests and flow control statements. There are implementations that may offer extensions to the HTML language in order to accomplish these functions but are not actually part of the HTML standards. By embedding some suitable programming language code inside HTML, the power of real programming language is realized.

A character entity can be written in two ways in HTML. One is called the symbolic reference while the other is the numeric reference. Symbolic references start with an ampersand and ends with a semi colon. The description of the symbol which is generally a shortened version of the full expression, can be found between these two. The letters in the middle are case sensitive and are usually lower cased, though there are exceptions.

Numeric references also start with an ampersand and finish with a semi colon, but between them is a number preceded by a hash. These are less memorable than symbolic references but correspond only to just a single byte of data. This can be very useful if one is trying to optimize pages for minimum download time. Symbolic references are sometimes referred to as entity references while numeric references are also called decimal references.

Most unusual characters can be directly entered without any problem. However, HTML character entities can be used in case one does encounter a problem. Lines and paragraph are automatically recognized. A couple of blank lines are added when paragraphs are not recognized.

A character entity is a method used to display special characters normally reserved for use in HTML. For instance, the less than () are used as part of the HTML tag structure, thus both symbols are reserved for the use. If there is a need to display these symbols on one's site, character entities can be used.

Problems

Many WordPress users are running afoul of character entities appearing in their comment RSS feeds, which many RSS/syndication readers fail on. The WordPress Plugin - Entity 2NCR seeks to resolve this by converting various HTML character entities such as », &, © and so on to their numeric equivalents. This plugin is for RSS output, but can also be adapted to posts if the user so wishes.

Installation of the Entity2NCR is not needed if a user is running WordPress 1.5.1 and above. It will only result to problems due to the plugin's function having the same name in the WordPress core. Upgrading to the most recent version is recommended since the plugin is already incorporated. The Entity2NCR should first be deactivated from the Plugin Admin before the installation of 1.5.1. The user should likewise delete its file from the WP-contents/plugins directory since it will just unnecessarily take up space.

The Entity2NCR is installed by downloading the zip file, extracting http://Entity2NCR.php from it and uploading this to the WP-content/plugins/directory and activating the plugin in WordPress. Entity2NCR hits the standard assortment of HTML character entities plus some of the more unusual and obscure ones as well. While this plugin primarily focuses on RSS output, both from posts and comments, it can also convert character entities in the regular content on one's blog as well. At the end of the plugin for the add-filter lines, the user is to remove the comment for any WordPress function he/she would want Entity2NCR to work on.

The RSS 2.0 spec is too vague although it can produce feeds that are valid, accurate and useful. This means that the contents of the feed should reflect the best possible representation of the article content. The spec does not say however, what to do if an article title contains HTML code or entities. It also doesn't say a lot of other things. In fact, an entire industry has sprung around the service of interpreting and fixing the various semantic differences between feeds. RSS application developers need to agree on some basic answers to fundamental questions instead of making endless conflicting discussions that do not help in any way.

Attribute Values

An HTML author should always put attribute values into quotes in HTML, although the formal rules allow the omission of the quotes in some cases. SGML requires that all attribute values are delimited using either double quotation marks or single quotation marks. Single quote marks can be included within the attribute values when the value is delimited by double quote marks and vice versa. Authors can also use numeric character references to represent double quotes and single quotes or use the character entity reference " for double quotes. There are cases that the values of an attribute may be specified without any quotation marks. The attribute value may contain letters, digits, hyphens and periods. It is highly recommended to use quotation marks even when it is possible to eliminate them.

There are several reasons to always use quotes around attribute values. It is much easier since there is no need to memorize and recall the rules for allowable omission. Another thing is that quotes are always required in XML. When one's HTML file is later edited, it may easily be forgotten to add the quotes in attribute value that is edited in a manner which makes the quotes mandatory. One drawback of doing this is the effort of typing and extra storage and transmission time required which are quite minor issues anyway. Quotes constitute just a small fraction of an HTML file.