Querying Wikipedia in ASP.NET using LINQ-to-Wiki

Have you ever visited Wikipedia and simply just gotten lost in the abyss of knowledge that is available there? If only something existed to allow you to easily create complex queries that would provide you with exactly what you needed using syntax that were familiar with like LINQ? Well then this may be just the post for you!

Introducing LINQ-to-Wiki

LINQ-to-Wiki is a library designed by Petr Onderka to query any sites running MediaWiki (which includes Wikipedia) through any available .NET language. It provides extensive functionality to allow complex queries to be performed and is not limited to just reading wiki pages, but it can also perform edits, content additions and more.

You can request a variety of different items that would otherwise normally require a significant amount of scrolling, clicking and result in the eventual "how did I get here" several hours later. All of this after losing focus on your original goal because of sheer magnitude and borderline addiction to knowledge the site can evoke.

A few of the many things related to Wikipedia content that can be accessed through queries in LINQ-to-Wiki are :

  • Listing all of the articles within a category
  • Listing all of the links contained within a page
  • Grabbing images and related articles
  • Full query and search support

LINQ-to-Wiki uses traditional LINQ queries that any .NET developer would be accustomed to and then the library translates these into API Requests through MediaWiki for whatever big plans that you are trying to conquer the world with.

Getting Started

LINQ-to-Wiki can be accessed in the following two methods :

Once you have added the appropriate references to the LINQ-to-Wiki files to your project, then you are ready to get started!

Your First Query

Querying is really where LINQ-to-Wiki shines! The actual querying process is very straight-forward and really doesn't differ much from using a traditional DataContext that you would be accustomed to working with in any other flavor of LINQ (e.g. LINQ-to-SQL, LINQ-to-Entities etc.).

You’ll first need to initialize a Wiki class that will act as your DataContext and the source of all of your queries. You can initialize it using actual Login information (only required if you are going to be performing more advanced queries) but in this demonstration we will just be focusing on querying, so feel free to make up your own credentials :

var wikipedia = new Wiki("Example");  

Once you have created your necessary Wiki object, then you will basically be ready to start querying. However, Wikipedia is a huge, complex data-filled cosmos and before we start adventuring around in our LINQ-powered spaceship, let’s take a look at a map to see where we can go.

Exploring the Cosmos of Wikipedia

Before we delve to deep into some serious querying, let’s review over some of the properties and collections that we can use from our Wiki object. Since this post is primarily concerned with querying, we will be looking at the Query property of our Wiki object.

var query = wikipedia.Query.AdventurePlaceholder;  

Some of the major properties that we will be concerned with regarding querying of our Query object are :

  • allcategories - This is an enumeration of all of the available Categories
  • allimages - This is an enumeration of all of the available Images
  • alllinks - This is an enumeration of all of the available Links
  • categorymembers - This lists all of the pages in a given category
  • backlinks - This finds all pages that link back to a specific page.
  • search - This allows a full-text search to be performed

From each of these we can use the LINQ methods that we all know and love such as .Where() and .Select() and then we wrap everything up to execute our query using the .AsEnumerable() method. Each of these items will also have specific properties that can be accessed within your inner clauses to further narrow your search, so don’t neglect how wonderful Intellisense can be.

Blasting off into the Cosmos

So let’s start out with a simple query to get ourselves off the launch pad. We will query Wikipedia for all of the images that start with "Microsoft" and return the title of each :

// This will retrieve all of the images that begin with "Microsoft"
// (using the built-in prefix property) and select the title of each.
var query = wikipedia.Query.allimages()  
                     .Where(i => i.prefix == "Microsoft")
                     .Select(s => s.title)
                     .ToEnumerable();

That’s it! Using a simple Controller Action within MVC (for this example) we can output each of our results to a basic list within our View :

public ActionResult QueryWiki()  
{
     var wikipedia = new Wiki("Example");
     var query = wikipedia.Query.allimages()
                          .Where(i => i.prefix == "Microsoft")
                          .Select(s => s.title)
                          .ToEnumerable();
     return View(query);
}

along with this simple View :

<ul>  
     @foreach (var image in Model){
         <li>@image</li> 
     }
</ul>  

will result in a huge list of all of the images within Wikipedia that begin with "Microsoft" :

Text Results

Text is boring. Let’s spice things up.

Let’s make things a little more appealing to the eyes by pulling some additional properties besides the title of the images. We can use the url, height and width properties available from our images to create a similar list that will feature images of each of these items instead of just a plain unordered list.

First, we will create a very simple class that will store the properties that we are concerned about that we can pass across to the View for display :

public class WikiImage  
{
     public string Url { get; set; }
     public int Height { get; set; }
     public int Width { get; set; }
     // Simple Constructor
     public WikiImage(string url, int height, int width)
     {
          Url = url;
          Height = height;
          Width = width;
     }
}

Using our new and improved query that will select the url, height and width properties from our image :

var query = wikipedia.Query.allimages()  
                     .Where(i => i.prefix == "Microsoft")
                     .Select(s => new WikiImage(s.url,s.height,s.width))
                     .ToList();

along with a few minor adjustments to the View (the controller action remains basically the same) :

@foreach (var image in Model){
     <img src='@image.Url' height='@image.Height' width='@image.Width' />
     <br />
}

which yields the following result :

Text Results

This post is a just a simple example of some of the things that you can do using LINQ-to-Wiki. If you find that this post piqued your interest, you might consider downloading the library and seeing what you can do with it.

More Information and Code Examples

If you are interested in learning a bit more about LINQ-to-Wiki, visit the GitHub page where you can find a plethora of documentation detailing each of the individual methods and properties that you can query against. I would also highly recommend downloading the LINQ-to-Wiki Samples project, which contains all kinds of samples to get you started.

You can also download this example from github from the link below :

comments powered by Disqus