Querying Wikipedia in ASP.NET using LINQ-to-Wiki

Have you ever visited Wikipedia and simply just gotten lost in the abyss of knowledge that is available there? If only something existed to allow you to easily create complex queries that would provide you with exactly what you needed using syntax that were familiar with like LINQ? Well then this may be just the post for you!

Introducing LINQ-to-Wiki

LINQ-to-Wiki is a library designed by Petr Onderka to query any sites running MediaWiki (which includes Wikipedia) through any available .NET language. It provides extensive functionality to allow complex queries to be performed and is not limited to just reading wiki pages, but it can also perform edits, content additions and more. You can request a variety of different items that would otherwise normally require a significant amount of scrolling, clicking and result in the eventual “how did I get here” several hours later. All of this after losing focus on your original goal because of sheer magnitude and borderline addiction to knowledge the site can evoke.

A few of the many things related to Wikipedia content that can be accessed through queries in LINQ-to-Wiki are  :

  • Listing all of the articles within a category
  • Listing all of the links contained within a page
  • Grabbing images and related articles
  • Full query and search support

LINQ-to-Wiki uses traditional LINQ queries that any .NET developer would be accustomed to and then the library translates these into API Requests through MediaWiki for whatever big plans that you are trying to conquer the world with.

Getting Started

LINQ-to-Wiki can be accessed in the following two methods :

Once you have added the appropriate references to the LINQ-to-Wiki files to your project, then you are ready to get started!

Your First Query

Querying is really where LINQ-to-Wiki shines! The actual querying process is very straight-forward and really doesn’t differ much from using a traditional DataContext that you would be accustomed to working with in any other flavor of LINQ (e.g. LINQ-to-SQL, LINQ-to-Entities etc.).

You’ll first need to initialize a Wiki class that will act as your DataContext and the source of all of your queries. You can initialize it using actual Login information (only required if you are going to be performing more advanced queries) but in this demonstration we will just be focusing on querying, so feel free to make up your own credentials :

var wikipedia = new Wiki("Example");

Once you have created your necessary Wiki object, then you will basically be ready to start querying. However, Wikipedia is a huge, complex data-filled cosmos and before we start adventuring around in our LINQ-powered spaceship, let’s take a look at a map to see where we can go.

Exploring the Cosmos of Wikipedia

Before we delve to deep into some serious querying, let’s review over some of the properties and collections that we can use from our Wiki object. Since this post is primarily concerned with querying, we will be looking at the Query property of our Wiki object.

var query = wikipedia.Query.AdventurePlaceholder;

Some of the major properties that we will be concerned with regarding querying of our Query object are :

  • allcategories
    This is an enumeration of all of the available Categories
  • allimages
    This is an enumeration of all of the available Images
  • alllinks
    This is an enumeration of all of the available Links
  • categorymembers
    This lists all of the pages in a given category
  • backlinks
    This finds all pages that link back to a specific page.
  • search
    This allows a full-text search to be performed

From each of these we can use the LINQ methods that we all know and love such as .Where() and .Select() and then we wrap everything up to execute our query using the .AsEnumerable() method. Each of these items will also have specific properties that can be accessed within your inner clauses to further narrow your search, so don’t neglect how wonderful Intellisense can be.

Blasting off into the Cosmos

So let’s start out with a simple query to get ourselves off the launch pad. We will query Wikipedia for all of the images that start with “Microsoft” and return the title of each :

// This will retrieve all of the images that begin with "Microsoft" (using the built-in prefix property) and select the title of each.
var query = wikipedia.Query.allimages().Where(i => i.prefix == "Microsoft").Select(s => s.title).ToEnumerable();

That’s it! Using a simple Controller Action within MVC (for this example) we can output each of our results to a basic list within our View :

public ActionResult QueryWiki()
{
     var wikipedia = new Wiki("Example")
     var query = wikipedia.Query.allimages().Where(i => i.prefix == "Microsoft").Select(s => s.title).ToEnumerable();
     return View(query);
}

along with this simple View :

<ul>
     @foreach (var image in Model){
         <li>@image</li> 
     }
</ul>

will result in a huge list of all of the images within Wikipedia that begin with “Microsoft”.

"Microsoft" Wikipedia Image Results

Query results containing all Wikipedia Images that begin with “Microsoft”

Text is boring. Let’s spice things up.

Let’s make things a little more appealing to the eyes by pulling some additional properties besides the title of the images. We can use the url, height and width properties available from our images to create a similar list that will feature images of each of these items instead of just a plain unordered list.

First, we will create a very simple class that will store the properties that we are concerned about that we can pass across to the View for display :

public class WikiImage
{
     public string Url { get; set; }
     public int Height { get; set; }
     public int Width { get; set; }
     // Simple Constructor
     public WikiImage(string url, int height, int width)
     {
          Url = url;
          Height = height;
          Width = width;
     }
}

Using our new and improved query that will select the url, height and width properties from our image :

var query = wikipedia.Query.allimages()
                     .Where(i => i.prefix == "Microsoft")
                     .Select(s => new WikiImage(s.url,s.height,s.width)).ToList();

along with a few minor adjustments to the View (the controller action remains basically the same),

@foreach (var image in Model){
     <img src='@image.Url' height='@image.Height' width='@image.Width' /><br />
}

which yields the following result :

A Ton of Microsoft Square Images

Results from our new query to grab all of the images that start with “Microsoft” on Wikipedia

This post is a just a simple example of some of the things that you can do using LINQ-to-Wiki. If you find that this post piqued your interest, you might consider downloading the library and seeing what you can do with it.

More Information and Code Examples

If you are interested in learning a bit more about LINQ-to-Wiki, visit the GitHub page where you can find a plethora of documentation detailing each of the individual methods and properties that you can query against. I would also highly recommend downloading the LINQ-to-Wiki Samples project, which contains all kinds of samples to get you started.

You can also download this example from github from the link below :

About Author

Rion Williams

Rion is a Software Developer and Microsoft MVP with a passion for making cool things and helping others. He appreciates you stopping by and hopes that you'll visit again soon.

8 Comments

  1. Hello everyone, it’s my first pay a visit at this site, and post is genuinely fruitful in favor of me, keep up posting these content.

  2. Oer Reply

    I agree as well. I understand the documentation provided on the github page has all the details of the library but some of the information is a little confusing and having read through this website has undoubtedly clarify some things.

  3. Oer Reply

    I agree as well. I understand the documentation provided on the github page has all the details of the library but some of the information is a little confusing and having read through this website has undoubtedly clarify some things.

    • The details were a bit hard to swallow as there was simply so much documentation available there. I thought a simple write-up like this would help people out a bit and make things a bit easier to understand.

      Thanks again for visiting the site :)

  4. Oer Reply

    Hello Rion,

    By any chance. Do you know if it would be possible to obtain all the information from a page with a list of subjects, for instance: http://en.wikipedia.org/wiki/List_of_sports

    I have been trying to change the parameters and even going into the wiki.Query class but I have not been able to get any info from the wikipedia sandbox and therefore, not sure how to do this.

    If you have any idea or suggestion, I would really appreciate it.

    Thank you.

  5. I believe that all of the code that I used follows a reasonable system of naming conventions (as there wasn’t an extraordinary amount of code that I wrote with the exception of the WikiImage class).

    The lowercase properties that you are noticing are a result of the actual LINQ-to-Wiki library itself. All of the references (such as i.prefix, i.url, and the allimages() methods) are a result of the naming conventions that were used within the library.

    I wasn’t that crazy about actually having to use that syntax myself.

Leave a Reply