Exploring Open Data: Notable Dogs in Wikidata
The Current State of Open Data on Notable Dogs
There are two categories of dog data that a dog lover and data nerd might be interested in:
- general knowledge about dogs, such as breed details, training and ownership guides, biological and evolutionary science, etc.
- information on specific famous or important dogs, or notable instances of the dog species, which is the focus of this week’s Open Data exploration Unfortunately, dog lovers and data nerds will find the public data available in both categories to be sorely lacking. Resources that provide general dog information do exist, but that data is not always retrievable, especially via APIs, database endpoints, or dataset downloads. For dog breed knowledge, an authoritative source is the American Kennel Club (AKC), which provides an excellent online dashboard, but API data access appears to only be available internally and to club members. Enterprising individuals have scraped AKC breed data from the site into downloadable datasets, but it is unknown how often these are updated. Good structured, semantic breed data is available to the public via Pawsome Authority’s structured, semantic JSON-LD datasets. Some institutions, especially universities, have done research into dog breed visual identification, and made those datasets available, but they are image-focused and light on structured metadata. An amazing Open Data initiative on scientific dog information is implemented by the The Dog Aging Project, who will provide controlled API access to their biomedical data upon request.
The state of Open Data on category #2, specific instances of dogs, is even more lacking. Online articles and lists telling the biographies and stories of famous dogs abound, but finding resources focused on providing structured, queryable data on these dogs was difficult. I divided my search for these resources by the different reasons a dog may be famous:
- Actors: Let’s say you’d like a list of all dogs that have portrayed the character Lassie. Dogs that have starred in television or film do appear in the big media knowledge bases, such as IMDb, but I failed to find any exposed “species” type identifier for actors, which would have allowed querying only for dog actors.
- Pets of world leaders: Dogs of US presidents are often known and loved by the nation. The Library of Congress provides access to a lot of material on presidential pets, but comprehensive search on presidential dogs is limited, with more focus given to online browsing than to dataset extraction.
- Heroes: Dogs who have benefited humanity abound, such as Balto, a sled dog who transported antitoxin, and Laika, the Soviet space dog who was the first animal to orbit Earth. A primary profession of dogs is in search and rescue, and some dogs who have saved multiple human lives have been officially honored or memorialized. No ongoing curation of all these hero dogs could be found.
- Social media stars: The universal adoration of dogs is especially apparent with dogs on platforms such as TikTok and Instagram that have amassed millions of followers. These platforms do not provide open access to querying the metadata on these furry “influencers”.
There are many other reasons why someone may want data on a specific dog. One of the most beloved dogs of all time, Hachiko, became famous simply for his loyalty and commitment to his deceased owner. With so much variance in why a dog is “of note”, it is understandable that there is as of yet no central source providing definitive, comprehensive, and accurate data on all notable dogs. Fortunately, we can turn to a general knowledge base like Wikidata for some of this data.
Exploring Notable Dogs in Wikidata
Although the data on notable dogs in Wikidata is sparse, most well-known dogs at least exist as entities with proper URIs. This at least allows for pulling a comprehensive set of specific dogs into the OpenDataDEx, the Knowledge Commons’ Open Data Explorer. And then there is just enough data for coarse clustering, allowing for navigation to still be interesting and revealing. The hub object, the entry point into the Explorer graph, is almost always a very widely known dog connected to other widely known dogs. This is expected for a subject where Open Data curation is likely highly dependent on “fandom”, or the number of people who are personally interested in a dog and taking it upon themselves to enter that data into Wikidata. It differs from subjects where Open Data curation is led by institutions whose mission aligns with providing the data on entities in that subject.
There are some general clusters around breeds of famous dogs in the DEx graph, indicating that a good job has been done of entering breed information into Wikidata. For example, when you reach a Jack Russel Terrier actor, like Soccer, who played PBS’s beloved character Wishbone, you’ll see direct reltionships (edges) to other Jack Russel Terrier actors, like Moose, the dog who played Eddie in the show Frasier. Because other properties differentiating these specific dogs are not as ubiquitous, the dog’s breed rises to the top as a main grouping tag (property:value pair).
For better structure to emerge in OpenDataDEx, the information that would most likely create the best clustering and navigation is why a dog is famous. This data is lacking in Wikidata, so dogs are much more chaotically and unexpectedly connected. You won’t find all search-and-rescue heros grouped together, or social media darlings grouped together, especially because these dogs are of varied breeds and level of fame. Certain reasons can be inferred from the data, such as having a property “ownedBy” and the value being an entity who themselves have the tag “occupation:leader of a nation”. But this indirect reasoning is difficult to capture by an ontologically-neutral exploration tool like the OpenDataDEx. Fortunately, enough patterns can be found from tags for navigation to still offer unique insights into the data.
Querying Wikidata for Notable Dogs
The SPARQL logic to query Wikidata for instances of dogs is very straightforward: a single triple “?dog wdt:P31/wdt:P279* wd:Q144 .”, where P31 is Wikidata’s foundational “instance of” property, P279* provides for all subclass instances, and Q144 is the basic “dog” entity (Note that the “/” operator allows SPARQL shorthand for “?dog wdt:P31 ?x . ?x wdt:P279* wd:Q144 .”). The Wikidata SPARQL endpoint currently returns that entire result set fairly quickly, and it provides a reasonable complete set of the well-known dogs in Wikidata, so this query is sufficient for notable dogs exploration. No further refinements are needed for our exploration. Alternative query logic was investigated, but no other paths gave results as complete and useful as just using P31/P279*.
This query also makes it very easy to create similar DEx knowledge graphs, but for different species. Change Q144 to Q146, and you’ll get all Wikidata instances of cats. Change it to Q726, and you’ll get all Wikidata instances of horses. You could even do a union for dogs and another species, giving unexpected comparisons in the data exploration.
Notable Dogs data as a Knowledge Commons
The stories of famous dogs exist with the narratives of humanity, becoming myth, entertainment, and inspiration. The huge amount of media that has been created around these dogs tells me that having sources for their compilation and details has real value. While Wikidata serves as a general Open Data resource for this information, its Linked Data nature means that anyone, anywhere, can curate the knowledge of only the dogs and metadata they care about, and that subset can be linked via RDF URIs to support a more full knowledge. This will help fill in the gaps we see that exist in Wikidata, especially the details on reasons for notability. My hope is that different communities, whether they be fandoms, pop culture archivists, historians, etc., continue this curation using Semantic Web principles so that the Notable Dogs Knowledge Commons that emerges can truly serve as a public resource.
For a demonstration of the potential of a Knowledge Commons, see the video above exploring notable dogs Open Data using the OpenDataDEx.