URL design
De Mi caja de notas
Révision datée du 31 décembre 2016 à 07:28 par Xtof (discussion | contributions) (Page créée avec « {{iwc}} »)
This article is a stub. You can help the IndieWeb wiki by expanding it.
URL design is the practice of deliberately designing URLs, in particular, permalinks, typically for a better UX for everyone who creates, reads, and shares content. The guidance on this page refers specifically to designing URLs for personal social content.
Why
By deliberately designing your URLs and URL structure, especially permalinks, you can:
- make them more usable
- communicate rough topic of publication
- make it easier for you to change your permalink policies over time (without breaking, or even having to change past years)
- communicate date of publication (if desired; debatable)
- URLs are UI. 2017-07-08 from Scott Hanselman:
Emphasis in original.You care a lot about the evocative 2meg jpg hero image on your website. You change fonts, move CSS around ad infinitum, and agonize over single pixels. You should also care about your URLs.
Why short URLs
This section is a stub. You can help the IndieWebCamp wiki by expanding it.
Why short clean URLs, or rather, disadvantage of long URLs (e.g. why short Etherpad or wiki URLs instead of / in contrast to long Google Docs URLs)
- short URLs fit better in many contexts
- e.g. in Slack channel topics & descriptions which are limited to 250 characters per: https://slack.com/help/articles/201654083-Set-a-conversation-topic-or-channel-description
- ...
How
More Usable
Make your permalinks more usable by:
- keeping your URLs human readable for simpler conveyance of information with URLs
- keeping your URLs short for easier sharing, reducing mental overload, greater reliability[1]
- Avoid giant long strings of numbers or characters, i.e. database IDs
Dates
Many people communicate date of publication by using a top-level structure that starts with a date:
/YYYY/MM/DD/
- the date in order of hierarchical significance/YYYY/DDD/
- ISO ordinal date order, saves two characters (shorter is better, though may not be immediately understood by human readers), and communicates a linearity to your year of posts.Avoid the "literal" ISO 8601 date patterns of use
/YYYY-MM-DD/
or/YYYYMMDD/
because by omitting path separators they are are less URL friendly (less hackable), e.g., https://chat.indieweb.org/2015-07-26.This is a very common practice for permalinks, but possibly more of an implementation detail and convenience than recommended design.
Publication date can be useful at a glance, but should be communicated in the page itself regardless. It's probably not so critical that it needs to be in the URL, especially if even the more useful topic is considered optional. In particular, some pages are "evergreen" and regularly updated, e.g. the list of books you've read, so their publication date isn't very interesting.
As an implementation detail, if you use date (bim, etc) to identify your posts and URLs and ignore any slug, it's easier to change the slug (and your internal identifier) later. You can redirect old slug to new ones without doing this, though. The important thing is that you create and maintain the redirects so that your permalinks keep working. How you do it is less important.
Non-publication dates
Though the predominant default in date-based URLs is the publication date of the page, there are some notable exceptions:
Issue date: magazines/periodicals typically identify a specific issue with a date (or month) that is often in the future. E.g. as of 2020-11-27, the following pages are valid and viewable
- https://www.newyorker.com/magazine/2020/11/30/how-venture-capitalists-are-deforming-capitalism
- The date in the URL indicates the "November 30th 2020" issue of The New Yorker; the page itself has an explicit visible publication date of 2020-11-23.
- This makes the URL "hackable" down to just the date to then display all the articles from that issue, e.g.: https://www.newyorker.com/magazine/1939/01/07
- https://www.theatlantic.com/magazine/archive/2020/12/can-history-predict-future/616993/
- For the "December 2020" issue of The Atlantic.
- Similarly, by trimming the URL down to https://www.theatlantic.com/magazine/archive/2020/12/, it redirects to https://www.theatlantic.com/magazine/toc/2020/12/ which displays a nice overview of that issue of the magazine
DRY violation
Though useful, putting the date in the URL for a post which contains its publication date in visible text is a DRY violation, and thus vulnerable to inconsistencies, e.g.
- https://blueskyweb.xyz/blog/3-6-2022-a-self-authenticating-social-protocol
- URL says "3-6-2022" implying a March 6th 2022 publication date, however
- Actual content says "Apr 6, 2022" which is correct, as April 6th 2022 was when the post was actually published
If you put the date of your posts in your permalinks, please take extra care to keep it accurate, and in the case of errors, make sure to fix the permalink and redirect from the errant version.
Topic
Communicate topic by using a "slug" somewhere after the date, e.g.
../tag1-tag2-tag3
Also:
- Many put the slug at the end of their permalinks
- Make the slug optional for identifying the post (i.e. not a required part of a permalink) if at all possible, since
- it contains human written/readable/editable content
- you may want to change it after the fact without any need for maintenance or URL redirects,
- it may be inadvertently truncated (like in email, or in IRC).
- it may be inadvertently extended by a bad autolinker that is errantly including a subsequent character, like a "." or a ","
- Some have chosen to make the slug required:
- gRegor Morrill: I like the method of slugs being optional and redirecting to the canonical URL, but I thought about it for a while and decided that I preferred having only the year and month in my URL hierarchy. If, in rare situations, a URL slug gets truncated, I plan to perform a search on the partial slug and present possible matches on the 404 error page.
Content Type
Individuals with large quantities of different content types may want to differentiate in the URL what type of content to expect, as it primes the user for the subsequent interactions with the content. For example, take this comparison:
/2014/11/10/url-design
- This URL could be anything about "URL design." It could be an article, a favorite, a reply, or even a photo about "URL design."/2014/11/10/reply/url-design
- This URL would be give the reader of the URL immediate understanding of the content to expect at that URL: a reply. If a person was not looking for this type of content, it would allow them the ability to skip over this content or be ready for a threaded conversation around "URL design."Ordinal
One of the drawbacks of having an optional topic slug as mentioned above is that a lot of posts could become difficult to pinpoint when posting multiple times per day. Adding a time-relative ordinal to the end of a given Date allows for better pinpointing while still maintaining relevancy to readers of the URL. Take the two Dates formats listed above and simply add the ordinal at the end (N). This maintains hierarchical significance in both types of Date URL structure.
/YYYY/MM/DD/N/
/YYYY/DDD/N/
Author
As web publishers began to publish more and more "snackable content," or short meme driven content, the feeds became overwhelming. Few publishers offer an author feed if you want to add specific journalist and avoid clickbait content.
- BuzzFeed articles urls follow content type, author, and then article title.
/article/alexkantrowitz/how-the-retweet-ruined-the-internet
However each author's page does not have a corresponding RSS feed.Avoid
See everything listed in this article and expand here inline:
- 2011-03-05 : URL Design Sins: 16 things that don't belong in URLs (archived)
Long URLs
Long URLs are fragile and break in many places, e.g.
Long URLs look less trustworthy, especially when they have a bunch of
utm_...
tracking parameters.Long URLs look ugly when copy/pasted into IM/PM/DMs.
IndieWeb Examples
See: permalinks#IndieWeb_Examples
Perspectives and Experience
Aaron Parecki
I've tried a number of different permalink formats over the years. Below is a list of some of my past attempts as well as a description of why I eventually moved away from it.
Pre-2012
/{year}/{ordinal day}/{type}/{sequence}/{optional slug}
Examples:
/2011/203/article/1/enabling-ssh-on-the-seagate-blackarmor-nas-220
/2011/203/note/9
Issues:
- The type in the middle of the permalink is a strange mixing of hierarchy. The original intent was to provide a hint at the content to expect at the URL in case there was no slug
- The ordinal day is not easily readable
2012-2015
/{type}/{year}/{month}/{day}/{sequence}/{optional slug}
Examples:
/notes/2015/12/23/1/h-card
/notes/2015/12/23/1
I originally had the type first to be able to create a feed for specific post types without querying a database, just reading the files on disk, since all notes were stored in a folder called "notes", articles in "articles", etc. The addition of a database as an index meant that this was no longer solving a problem. Moral of the story is: don't let implementation detail affect URL design.
2016-Present
/{year}/{month}/{day}/{sequence}/{optional slug}
Examples:
/2015/12/23/1/h-card
/2015/12/23/1
Ryan Barrett
snarfed.org has two main types of URLs:
- posts are prefixed with date, eg
/2019-05-26_raymond-t-lahar
- pages are not, eg
/about
(Both have created and updated timestamps in the page contents themselves.)
FAQ
Why Not US Date Order
Q: Why not US date order like /march/15/2014/ ?
A: Lots of problems with this:
- US-centric - the web is world-wide
- English-centric month name "march" is not international friendly (again, *world-wide* web)
- does not follow hierarchical significance - "march" and "15" are less significant than 2014.
- makes it harder to change URL policies every year (since the year is the 3rd component instead of the first.
Time
Q: What is a good way to represent time in a URL?
A: There are several reasonable approaches. Using zero-padded hours HH (24hr), minutes MM, and seconds SS:
- Immediately after the date with separators, e.g.
or even
/YYYY/MM/DD/HH/MM/SS
/YYYY/MM/DD/HH:MM:SS
- Immediately after the date without separators - less readable but ok
/YYYY/MM/DD/HHMMSS
Omitting the seconds SS is ok too if you don't find yourself posting more than once a minute.
/YYYY/MM/DD/HH:MM
Alternatively if you post more than once a second (e.g. automatic metrics), you may want to include a digit or two of decimal seconds.
/YYYY/MM/DD/HH:MM:SS.ss
Why not AM PM
Q: Why not times with AM and PM like /12:57pm/?
A: Some problems with this:
- AM/PM are easily confused / misread (bad for usability)
- makes the URL longer unnecessarily (compared to 24hr time)
- less international - 24hr time is more readily recognized when reading world-wide.
Why not content type first
Q: Why not put the type of content first in the URL structure, e.g.
/reply/2014/11/10
?
A: Many reasons:
- The more known/stable aspect should go first. Dates are much more well understood (stable) and well known than "kinds" of posts, which are still squishy and growing. Permalinks and URLs in general are supposed to be stable, thus putting the more stable pieces first makes sense. Less change if you do have to change the squishy parts.
- Year first allows changing URL policies more easily, like once a year. If your year is first, you can set a policy for how your URLs work each year and change it, not having to go back and change past years.
- Keeps the URL distinct from type implementation. The type usually has to be defined somewhere in the implementation of a site or in a list of allowed types. Changes to the names, classification, or implementation of types can be made without changing the URL.
- Makes it easier to experiment with new post kinds. If the type is not in the URL it is easier to change the type of a posts type later. For example, a new post could originally be published as a note and later changed to a more specific type like an RSVP.
- Experience. Aaron Parecki in particular started with a type-first URL structure, and is now having to convert it all to date first because of various scaling, storage, and other reasons.[3] Anthony Ciccarello also started with type first when the kinds of posts were more distinct but changed to a generic /posts root once he added more types. [4]
Alternatively:
- Ben Roberts puts content type first as a user feature /note/2015/8/12/4/ but do not depend on it for identifying posts (the problem Aaron Parecki was having). Putting in an incorrect type will automatically redirect to the correct URL. This allows for more intuitive URLs for streams of type specific posts. /note/ will list all notes, /photo/ only photos, etc.
URL in a URL
Q: How can I put a URL within a URL as in
http://example.com/other/http://example.org/post
?Aside: note common use-cases for a URL within a URL:
- Syndicating in some content from another site, e.g. sites that accept content submissions like IndieNews
- Archiving a copy of content from another site, e.g. Internet Archive Wayback Machine, or personal sites that archive/serve a copy of content from defunct or lost sites or zombies.
A: Current best practice: remove the protocol part of the second URL as
http://example.com/other/example.org/post
. For example:The problem with the example in the question is that URLs should have only one instance of
http://
in them for readability.Alternatives:
- Pass the second URL as an encoded parameter. Disadvantage: URL encoded parameters are uglier and somewhat obfuscate the other URL.
- Keep the protocol part. Disadvantage: this makes your URL uglier (and may confuse some autolinkers). E.g.
https://web.archive.org/save/http://example.org/post/
will save a snapshot of the given URL to the Wayback Machine.Articles
- 2016-05-29 Daniel Appelquist: In defense of the URL
I bet if you presented people with a URL and asked them “what is this?” they would tell you something like “it’s a web address,” “it’s a web site,” “it’s an internet address,” “it’s a link” or something that indicated they basically knew what it was.
These articles need to list their dates of publication explicitly and article titles, with perhaps a brief blockquote specific to what aspects of URL design they are coveringMore thoughts on (potentially additional aspects of) URL design:
- http://manas.tungare.name/blog/url-design-sins-16-things-that-dont-belong-in-urls
- http://warpspire.com/posts/url-design
- http://www.smashingmagazine.com/2014/05/02/responsive-design-begins-with-the-url/
- http://www.creativebloq.com/design/design-perfect-url-1126509
- http://www.webmonkey.com/2011/01/a-guide-to-designing-cool-urls/
- http://philarcher.org/diary/2013/uripersistence
Semantic Web related:
See also permalinks e.g. for why
Brainstorming
Posting UI
- I use /new (or /new/note , /new/checkin) for my posting UI. Tantek Çelik asked why not /create instead of /new. The URL is slightly shorter but I normally use /create for the internal site call to actually create the object, not show the creation UI. Additionally I feel this follows the naming scheme for text editors ("New Document") and "create" is more a software action. Ben Roberts
URL Template
Consider creating a URI Template for your site.
E.g.
See Also
- design
- redirect
- URL in print
- Keep a simple URL structure (archived), some good (quotable?) advise. Also a source for Google’s recommendation of hyphens instead of underscores. :
- Tantek's flickr tag URL with examples of domain names as branding and identity.
- Supporting accidentally truncated URLs is mentioned, but some URLs may have additional text like punctuation marks.
- Some reasons why good URL design matters: The mysterious case of missing URLs and Google's AMP
- https://tantek.com/2013/104/t2/urls-readable-speakable-listenable-retypable ( https://twitter.com/t/status/323487484216483840 ) (incorporate into the scattered thoughts on short URLs which are good for similar reasons, and cover the typing aspect)
- "URLs should be readable, speakable, listenable, and unambiguously retypable, e.g. from print: tantek.com/w/ShortURLPrintExample #UX" @Tantek Çelik April 14, 2013
- Brainstorming: consider re-using Flickr-style tags URLs as a method of filtering a collection/archive of posts on a specific tag, e.g. https://flickr.com/photos/tags/indiewebcamp
- use literal user names in URLs. Better to make a list of disallowed usernames and only use the literal username in a URL path (e.g. Micro.blog and Twitter style), than pollute the URL with an extra character like ~ (classic Unix, Tildetown) or @ (Medium, also bad because it apparently confuses some auto-linking regexes)
- Threat: 2020-08-13 Google resumes its attack on the URL bar, hides full addresses on Chrome 86 and Hacker News discussion with many points about good URL design: https://news.ycombinator.com/item?id=24156986
- Considerations for using numbers in your domain and URLs: 2014-05-02 The New Republic: The Secret Messages Inside Chinese URLs
- On using only custom short slugs for every page: 2022-05-08 : Short URLs: why and how (archived)
- From the comments, a similar article from 11 years earlier: 2011-05-10 : On short URLs (archived)
- 1999-03-20 Nielsen Norman Group: URL as UI
- http://superkuh.com/thisurlnamehasnothingtodowiththeactualtopicijustnameditthisforkicks.html
<footer>source iwc:URL design</footer>
<footer>