Arman
Arman

Reputation: 1084

Getting the list of ALL topic names from Freebase

According to Freebase, they have 23,407,174 topics. What is the easiest way to get the UI friendly names (essentially the 'text' attribute of the topic JSON, example of a single topic JSON is here) of ALL of these TOPICs? I don't need any other meta information.

Upvotes: 1

Views: 1481

Answers (2)

Tom Morris
Tom Morris

Reputation: 10540

wget -O - http://download.freebase.com/datadumps/latest/freebase-simple-topic-dump.tsv.bz2 | bunzip2 | cut -f 2 > freebase-topic-names.txt

although you probably want the Freebase IDs as well so that you know what the names refer to:

wget -O - http://download.freebase.com/datadumps/latest/freebase-simple-topic-dump.tsv.bz2 | bunzip2 | cut -f 1,2

Two additional bits of postprocessing are needed:

  1. Tabs are escaped as \t
  2. The string \N represents a null (non-existent) name

Upvotes: 1

Shawn Simister
Shawn Simister

Reputation: 4603

Take a look at the Simple Topic Dump that we provide. It's over a GB of compressed data but its still faster to download than trying to get all the names through the API.

Upvotes: 0

Related Questions