CRAWDAD metadata
CRAWDAD metadata is a description for CRAWDAD data, tools, and related authors and papers. Here is a sample metadata and following will be how to read and navigate the metadata.
Metadata structure (example)
- [Data]
- [Dataset] ucsd/sigcomm2001 (v. 2002-04-23) [what's new] [version history]
- [Traceset] ucsd/sigcomm2001/snmp (v. 2002-04-05)
- [Trace] ucsd/sigcomm2001/snmp/Stations (v. 2002-04-05) [download 4 MB zip]
- [Trace] ucsd/sigcomm2001/snmp/AP_Mibtree (v. 2002-04-05) [download 59 MB zip]
- [Traceset] ucsd/sigcomm2001/tcpdump (v. 2004-11-09)
- [Trace] ucsd/sigcomm2001/tcpdump/08292005 (v. 2002-04-23) [what's new][download 267 MB gz]
- [Traceset] ucsd/sigcomm2001/snmp (v. 2002-04-05)
- [Dataset] ucsd/sigcomm2001 (v. 2002-04-23) [what's new] [version history]
- [Tools]
- [Tool] ucsd/sigcomm2001/tool/snmputil.exe (v. 2002-04-05) [download 73 KB exe]
- [Tool] ucsd/sigcomm2001/tool/extract.pl (v. 2002-04-05) [download 3 KB pl]
- [Authors]
- [Author] Anand Balachandran
- [Author] Geoffrey M. Voelker
- [Author] Paramvir Bahl
- [Author] P. Venkat Rangan
- [Papers]
- [Paper] meng-flows
- [Paper] balachandran-behavior
CRAWDAD metadata has four categories: data, tools, authors, and papers. As shown in the above example, metadata structure represents a hierarchy in each category. For example, there is a hierarchy of dataset, traceset, and trace in data category in that order. The other categories - tools, authors, and papers - have only one level of hierarchy.
- Hiearchy in data category : A dataset is a set of wireless network data, collected by the same organization on the same type of network with some temporal locality (e.g., without a long time gap). For example, the dataset in the above example is a set of data which were collected by University of California, San Diego on the 802.11 network of a conference held in the campus during three days. A traceset is a set of traces that were collected using the same measurement technique, e.g., snmp, tcpdump, syslog, etc. A dataset can contain multiple tracesets, and a traceset can contain multiple traces.
- Hierarchical naming : Naming in data category follows the hierarchy of dataset, traceset, and trace, by joining them with "/". For example, the dataset "ucsd/sigcomm2001" has two tracesets, "ucsd/sigcomm2001/snmp" and "ucsd/sigcomm2001/tcpdump", which represent trace sets collected using snmp and tcpdump, respectively. Likewise, the traceset "ucsd/sigcomm2001/snmp" contains two "downloadable (by clicking [download] link)" traces, "ucsd/sigcomm2001/snmp/Stations" and "ucsd/sigcomm2001/snmp/AP_Mibtree". More information on each entity (dataset, traceset, or trace) can be obtained by clicking its name.
- Other categories: represents the tools, authors, and papers which are related with the entities shown in data category.
- Versions: We assume that only the entities in data and tools categories have versions: the entities in the other categories have no version. We use the release date as a version number. For example, the version number "v. 2004-04-23" of the dataset "ucsd/sigcomm2001" indicates that the dataset was released on April, 23, 2004. For browsing all the versions, you can click "[version history]" link. If you want to know the changes from the previous version, you can click "[what's new]" link.
- Fields:
When you click each entity (e.g., dataset, traceset, trace, or tool), actual metadata
appears in a series of metadata fields.
The following fields are common to all entities (dataset, traceset, trace, and tool):
- version: metadata version (see above)
- changes: changes since the last version (release)
- bibtex: bibtex entry used for reference in papers
- metadata last modified: date when the metadata "description" was last modified (note: this date may be different from the release date)
- summary: executive summary
- release date: date when the entity was released
- download url: file size, type, and download location
- related data/tools: the other entities related with the entity
The following fields are common to all data entities (dataset, traceset, and trace):
- measurement start: date when measurement started
- measurement end: date when measurement ended
- measurement purpose: e.g., Usage Characterization, Network Performance Analysis, etc
- sanitization: how to tidy up the data especially for protecting the privacy
- hole: missing data due to system failures or configuration mistake
- error: incorrectly measured data
- limitation: what the methodology used cannot collect or accurately measure
- note: other description
The following fields are specific to dataset:
- keyword: keyword list used for "Browse" page
- authors: author list
- web site: an original web site or a CRAWDAD web site for the dataset
- wiki: wiki address for the dataset
- network type: e.g., 802.11 infrastructure, bluetooth, etc.
- environment: non-technical description (e.g., on the user population) of the dataset
- network: network configuration
- collection: collection methodology
The following fields are specific to traceset:
- methodology: detailed description of collection methodology
The following fields are specific to trace:
- derived: false if the trace is an original raw trace, otherwise true if the trace was derived from another trace.
- format: trace format
- configuration: experimetal setup for collecting the trace
- tools used: tools used for the trace
The following fields are specific to tool:
- keyword: keyword list used for "Browse" page
- authors: author list
- web site: an original web site or a CRAWDAD web site for the tool
- wiki: wiki address for the tool
- license: terms of copyright, usage, change, or distribution of the tool
- support: how to get supports on the tool
- build: how to build the tool
- intput: input for the tool
- output: output for the tool
- parameters: parameters for the tool
- usage: detailed usage
- example: usage example
- algorithm: algorithm used for the tool



