aggregation

The Content Management System Isn't the Enemy -- Unless It Is

From Cole Camplese, Should it all be Miscellaneous?:

The idea that we can follow a book filled with instructions on how to do information architecture, web design, usability, and so forth may be crazy.

Some great conversations going on about structuring dialogue within organizations, and the inherent tension between freely flowing conversation and institutional control over the messages contained within that conversation, and the need for quality control over content affiliated with an institution.

In addition to Cole's post (linked above), D'Arcy Norman has a couple of good posts that provide some context.

D'Arcy and Cole both talk about the relationship/tension between the institutionally controlled (and provided) CMS, and the role of user created content in that webspace. As I see it, this is more of a design issue -- what mechanisms are you creating as you build your webspace to accomodate content from a variety of sources? A good CMS allows for easy interoperability, and good design exposes that interoperability to the end user in intuitive ways. While this isn't a conversation about tools per se, the limitations of of the underlying CMS play a factor here, but that's a different discussion.

As I see it, the following factors (among others, of course) need to be addressed in the design of an inclusive webspace:

  1. Low barrier to entry.

  2. Multiple points of entry for end users (ie, choices. A user can post from multiple sites, and push their content to the institutional web space).
  3. Tools for multiple ability levels -- some users will only want to use their own blog, while others will be perfectly happy using a tool provided by the organization. Both choices are perfectly okay.
  4. Guidelines and tutorials for posting to the system using both external tools, and publishing tools provided by the organization. At its most simple, this would include tagging guidelines, and links where external users could submit their rss feeds. This assumes, of course, a system designed to handle aggregation and embedding of external content.
  5. A governance model designed to vet content, and maintain quality control over critical areas of the organizational web presence. We're not talking about abandoning IA, or about turning the organizational webspace into Rome as enjoyed by the Visigoths; rather, we are talking about a system with clearly defined publishing workflows for content essential to the daily functioning of the organization (think admissions info), with rules and guidelines that permit the inclusion of quality content flowing into the system from external sources.

Within a new publishing model, the Information Architecture (IA) required/desired by the organization still has a critical role, but the IA realistically can't be extended over all content in all contexts. At a certain point, IA stops being an organizational tool, and something central to the user experience, and becomes a barrier to efficiency. This becomes especially true when IA gets extended into the learning space -- and the learning space/community building space is where "miscellaneous" needs to flourish freely.

Content management often gets dragged out as the punching bag here, but the problem has less to do with the CMS than it has to do with restrictive rules governing the use of the CMS. Some of these restrictions are, of course, designed into the specific CMS/platform, but all CMS's are not created equal, and it's important to separate the choices made by an admin/organization from what is actually required by a CMS. In many of the discussions I hear about freeing content for reuse, the definition of a CMS gets conflated with the rules governing its use.

It's also worth noting that the most secure system is one that is so complex that no one will use it. From a sysadmin perspective, that's great. No users vastly reduces the security risk, and virtually ensures that the IA will remain intact, and unsullied by user error. I'd love to see some good numbers on the amount of time used within organizations chewed up by end-user "training" (ie, here's how to work around security requirements) compared to support and outreach (here's how to immediately be productive using any of these freely available tools).

We're past the point where IA, publishing workflows, and quality control are mutually exclusive. The meaning of "managing" content has shifted. We can set up publishing workflows that direct selected content into an existing navigational structure, and the route can include steps for editing and approval. Allowing people to work with the tools of their choosing doesn't mean selling the farm and turning your organizational site into MySpace. Providing options for users, and allowing increased interoperability between these tools and the organizational webspace, requires planning. The barriers aren't technical; they are organizational. More importantly, the migration/flight away from the organizational enterprise is well in progress. While the IA of many organizations doesn't reflect this, the change is already occurring.

RSS Redux

9:30 -- Re-read Brian Lamb's blog post.

9:33 -- Poked around Stephen Downes' site, reading over some of the documentation on Edu_Rss. Really, I'm hoping to find an OPML file. Bingo.

9:40 -- Create a database on educon20.org.

9:42 -- Go to Drupal.org -- grap a copy of the 5.7 codebase, and the following modules: FeedAPI, FeedElement Mapper, Views, Views Bonus, Tagadelic, and CCK. At a later point, if nothing blows up, I'll probably add in Similar Content.

9:50 -- untar code. Realize I'm curious how long this will actually take, and resign myself to getting less sleep than I originally hoped. So it goes.

9:59 -- upload code to the server. Crack a beer. A good one.

10:04 -- bring site live.

10:08 -- in the process of installing the modules, realize I have forgotten to download the SimplePie parser. Oy.

10:16 -- create settings for the imported feeds, and create taxonomy categories the individual posts.

10:23 -- test import with a test feed. It looks good.

10:30 -- import opml file

10:35 -- first attempt at opml import bombs. Time to increase the memory allotted to php scripts in the settings.php file. Bumping it up to 40M ought to do it. If that doesn't work, I'll break up the opml file into multiple parts. At this point, I congratulate myself on the wise choice made at 9:59. A lesser beer would offer less solace during these times of peril.

10:42 -- second attempt bombs again. Time to try a third attempt, and see if it bombs in the same place. Don't know if I'm running into a php timeout, or a malformed xml file.

10:45 -- third attempt. Fingers crossed.

10:46 -- bombs out at close to the same place. In all likelihood, a php timeout issue. Small curses.

10:57 -- finished editing the original opml file into 4 smaller opml files. The first one imports with no issues -- 100 feeds down. Now trying the second opml file, which is larger than the first.

Note: I'm doing all this via a wireless connection, which is rather silly. When I am uploading files, I prefer to use a wired connection, as there is less chance of a transfer getting munged.

11:06 -- the second opml file bombed -- edited it into two smaller opml files. Trying again now.

11:13 -- the first two opml files have imported cleanly. The third is importing now. After this, two more to go.

11:22 -- opml import complete. Now, to begin the process of importing the feeds.

11:23 -- first cron run begun. In Drupal, there are many wonderful things that occur during a cron run. It is a sign of my general disintegration that I now have an active interest in things that occur during a cron run. During the first cron run, nearly 1000 posts were imported from the various feeds.

11:26 -- second cron run begun. An additional 2000 posts imported

11:30 -- third cron run begun.

11:37 -- fourth cron run begun.

11:45 -- create default views for imported feeds, and keyword directory.

12:06 -- install Similar Terms module -- this is a lightweight content recommendation engine.

12:25 -- for the last 20 minutes or so, I've been lost reading content.

12:40 -- set up a cron job to run automatically. This will serve two main purposes: import new posts, and index the site so that the search actually works. It will probably take about half a day for the site to get fully indexed; after that point, the full text search will work pretty well.

1:00 -- clean up this post. Wonder why I didn't go to bed earlier.

As of this writing, a little over 3.5 hours from when I started, there are nearly 7500 posts imported from around 500 different feeds.

Interesting Happenings at BYU

I saw this earlier today over at groups.drupal.org --

Kyle Matthews and Clint Rogers built a Drupal site in suppport of a web analytics class. The site aggregates student blogs and expert blogs; this way, everyone blogs from their chosen blogging platform, and their feed gets imported into the course site. In other words, people use whatever blogging tool they are currently using, and the software running the course (in this case, Drupal) adapts to the participant. This is a nice contrast to the usual approach, where all participants must adapt to the structure required by the LMS.

The site was built using the FeedAPI and the Feed Element Mapper. We have talked about organizing classes and building Open Educational Repositories like this in the past, and our main proof of concept site has been humming along for the last few months with no issues at all.

There has been some great development behind the FeedAPI; just last week, the folks over at Development Seed put out another screencast showing how they are extending the functionality even further.

Thoughts on Sharing Lessons

I’m writing these ideas out quickly -- there are sure to be holes in this, and gaps in this reasoning -- please point them out in the comments.

For some context on this post, see these two threads on Dan Meyer's blog.

Users working with online lessons will generally fall into at least one of the following categories:

  1. People searching for lesson ideas (probably the majority)
  2. People already creating content on their own blogs (a growing number of folks, but still a very small percentage, compared to people in category 1, or even teacher-bloggers)
  3. People looking for a place to create content (people who want to create blogs, etc -- I have no idea how many people fall into this category, but I’d imagine that if people, particularly younger teachers, saw the benefit they would have some amazing things to contribute)
  4. People who will find lessons on another site, edit/revise those lessons for use in their class, and republish the updated content on their own site
  5. People who will edit/revise content on someone else’s site (ie, wiki-style) -- the majority of these people would probably be very committed to the ideals of Open Educational Resources (OERs), have part of their professional responsibilities include curriculum development, or have some other type of immediate personal connection to a learning community. These people would probably be the ones to make the greatest use of any social networking features within the site

Produce --> Share --> Reuse --> Remix -- where does influence fit in? The influence of shared lessons, and the role that influence can have in helping a teacher develop and revise their existing materials, should not be overlooked.

Most working teachers do not have the time to collaborate online with other teachers to create freely available resources. Most of the teachers I talk to barely have time to engage in that type of collaboration within their own schools, let alone within an online/social networking context. Most teachers, even the ones currently blogging their lessons, do not have the free time to join another site and learn another system, even if there are long-term benefits. Teacher time needs to be respected, which is why any system that mandates a teacher use a new tool to participate will lose a good number of potential contributors due to that barrier to entry.

Here is what I propose -- and what I have partially built, here: http://threeclicks.org/lessons

  1. A site that aggregates lessons already being published online. This way, any teacher currently blogging lessons doesn’t need to change a single thing about how they work. If they want to make it easier, they can choose to tag any lessons with a unique keyword, like “lesson” -- this would allow us (in most cases, anyways) to aggregate posts in that specific keyword.
  2. All imported lessons are full-text searchable, and, when possible, tagged with keywords that describe the lessons
  3. Organize the lessons by content area
  4. Possibly, add in rating mechanisms to allow site members to rate content
  5. All posts imported into the site can be printed via a print-friendly page, and exported via rss.
  6. As a further development, possibly create a mechanism where site users could clone and revise imported content, or create new lessons to be published within the site. This lesson development would leverage content already created and imported into the site, or could be used by interested people to develop learning resources from scratch. For this type of curricular planning, we could incorporate wiki-type functionality.
  7. As noted by David Rothstein here, we could incorporate a “request a lesson” feature

What is missing? Please add any necessary details/suggestions in the comments.

On Aggregation, and Crow

A mildly edited version of my response to Jim Groom's post over on the bava --

D'Arcy mentioned the need for this to scale, and he's right. With that said, I don't think we need to have scalability to 100K students as a first goal. The beauty of the small pieces loosely joined is that it's easier, and that it's a step away from the monolithic LMS's so beloved by so many --

Toward that end, it's good to consider what we'd need to carry from the blog to the aggregator in order to connect a student work with an institutional SIS/LMS. To start, I see two factors as essential: first, mapping a feed to a student, and second, mapping individual posts from within a feed to a course.

The first piece is relatively straightforward: within the institutional aggregator, map each feed to a userid within the school's system. This way, institutional IDs are not exposed via any type of feed, and the connection of student feed to institutional record occurs where it needs to: within the institutional aggregator.

The next piece gets trickier: embedding course info into the feed. I actually think the easiest way to do this would be to use the Atom feed, as the Atom feed is designed to carry additional info (as an xml payload within the feed). Google is using Atom feeds this way on Open Social (although for a far more complex implementation, carrying friend data), and given that WP already generates Atom feeds, it makes sense to leverage what's already there.

So, on the WP side: some new code that creates a drop down list of course names keyed to course IDs. When a person is creating a blog post, they have an additional field containing a list filtered to their own courses. If we want to get really tricky, we could include whether the poster is a student, instructor, ta, etc, for a specific course. This would involve querying/syching data out of the school's Course Management System and exposing it via the WP UI.

On the Drupal side, this data would need to be mapped into taxonomy terms (and this code already exists/is working on the feeds site). This mapped taxonomy term would automatically generate a feed from within Drupal of every post in the course, and these posts could also be displayed on a course by course basis -- so we could filter by author, course, keyword, date, etc. Then, within Drupal, OPML feeds per course would need to be exposed to privileged users -- these OPML feeds would be exportable, and would allow someone to subscribe to all the feeds in a single course in one step. Creating these OPML feeds would require new code. Alternately, it would be possible to create a page view of all the posts within each course using the views module, date filters, etc.

While there would still be more work to do after this, coding solutions for these two items (add course data to the atom feed from within WP, and generate the OPML feeds from within Drupal) would allow feeds from WPMU to be aggregated and sorted by student and course.

The advantage of creating the drop down list for courses is that the process of selecting/typing the correct tag is simplified. The disadvantage, though, is that any user not on a school-offered blog is out of luck. In order to support a wider variety of platforms basic keywords could be used on a course by course basis. Then, within Drupal, keywords that have been reserved for a specific course could be handled differently than other keywords. This system would be far more prone to user error (and would subsequently have issues scaling) but it has the additional advantage of working with any blogging platform that supports tags on posts. WP does that, right?

:)

Also, re the title of your post, I make a habit of eating crow, but I like to do it in style.

http://bertc.com/three_crows.htm

Bon Appetit,

Bill

A Thanksgiving Feed

Over the last two nights, I put some time into building out a rough proof of concept showing some of what can be accomplished via a good aggregator and Drupal's taxonomy structure.

We've been thinking about/using aggregation in a variety of ways for the last couple years, but the development of the FeedAPI has created some pretty amazing possibilities faster than we could have hoped. I've been meaning to build out a site like this for the last few months, but a couple of recent conversations stirred me into actually doing it.

What has been fun about building out this proof of concept was how quickly the site came together. It's rough, and has no graphic design component at all, but the core functionality came into place quickly.

The results are here, and I'll include the brief description from the homepage of the site.

First, the useful details:

This site is designed to show the utility of a single location as a collection point of content from disparate sources, and how that content can then be re-organized by use of keywords to categorize the content that has been imported.

On this site, all imported content retains all keywords added to the post by the author. Additionally, new keywords are added to posts on import to allow for the content to be searched and organized in other ways.

A brief technical overview:

If you are not a geek, you can stop reading here. If you are a geek, read on!

  • This site uses Drupal as the main framework.
  • As this site is a proof of concept, we kept things light. The only core modules in use are Menu, Search, and Taxonomy. This site uses no path aliases, and the theme is the lightly modified Zen theme that ships with DrupalEd.
  • Aggregation is handled by the FeedAPI, and extended by the Feed Element Mapper.
  • The Similar By Terms module handles the content recommendations that can be seen alongside posts (see here for an example).
  • The Views module generates several of the screens for displaying and navigating the imported content, and the Views Bonus module extends these views.
  • Finally, CCK is installed and enabled (although, for this implementation it could probably be eliminated if necessary); and HTML Corrector is installed to clean up any unclosed tags that on imported feeds that could break the layout.

For those keeping track of such things, this site has taken a grand total of six hours to build, including this writeup. The functionality of this site is all achieved using modules and code currently available within the Drupal community.

One group of folks deserve a special mention: the team of people behind the FeedAPI module. For those interested, you can see a lot of the discussion at the RSS and Aggregation group. They planned and executed a great project, and without their work this site would not be possible.

Syndicate content