INGV INGV OCEANO GROUP BO ERDDAP
Easier access to scientific data
| ?    
Brought to you by INGV OCEANO GROUP    

ERDDAP > Information

ERDDAP is a data server that gives you a simple, consistent way to download subsets of scientific datasets in common file formats and make graphs and maps.

Table of Contents

The Problems that ERDDAP Tries To Solve

Without ERDDAP, when a person (or a computer program) looks on the Internet for a specific type of scientific data (for example, satellite sea surface temperature data), there are problems ...

ERDDAP's Solutions

DAP? OPeNDAP? DODS? ERDDAP? What's the difference? My (Bob's) understanding is:

DODS (Distributed Oceanographic Data System) was created in the 1990's, before there was http: (!). The DODS system created and used the dods: protocol on the Internet. When HTTP came along and was so successful, they switched from dods: to http:.

At some point, they realized the system was useful for more than just oceanographic data. So they ditched that DODS name (although it lives on in some code), formed a small organization called OPeNDAP (external link) and wrote the DAP (Data Access Protocol) specification (external link), which standardizes the format of the requests for metadata and/or data, and the responses with the metadata and/or data. OPeNDAP (the organization) still shepherds DAP (the specification) and is the author of Hyrax (the data server which is often mistakenly referred to as OPeNDAP).

Hyrax, THREDDS, GRADS, ERDDAP and others are data servers (software) which implement DAP. They each implement a subset of DAP but do other things very differently.

ERDDAP uses code (in the "dods" directory) (actually written by Jake Hamby at NASA JPL) for some features of reading data from external DAP servers. ERDDAP uses its own code to write out DAP responses.
 

Is ERDDAP a solution to everyone's data distribution / data access problems?
No. ERDDAP tries to find a sweet spot that is a really good solution to most of the data distribution problems that we confronted. ERDDAP takes a middleware approach: It can get data from lots of different types of remote data servers and it can give that data to clients in lots of different file formats. It is designed to be an agnostic solution which seeks to make other data servers (OPeNDAP, SOS, OBIS, WMS, ...) interoperable. Is there one perfect data server that meets everyone's needs perfectly? We don't think so. And even if you think there is or will be, it will be a long time before everyone switches to it, if ever. Until then, ERDDAP is available right now to make other data servers interoperable and to serve data right now.

ERDDAP can handle many/most datasets as is, but not all. It isn't that the remaining datasets (e.g., model data using a cubed sphere projection) aren't important. It's just that ERDDAP's goal of returning data in common file formats (some of which are pretty simple), precludes a more complex internal data structure. Groups of researchers working with more complex data structures often already have specialized data servers and specialized client software which are customized to their community's needs. ERDDAP, as a general purpose data server, doesn't try to compete with these specialized data servers. They are customized to the needs of their community and do a great job. However, those datasets are often only "understood" by the specialized software in that community.

A Work-Around for Complex Datasets - ERDDAP has a way to handle complex datasets that it can't handle directly. Just as a relational database (external link) can store a complex dataset by using just one simple data structure (a table), ERDDAP can serve the data from more complex datasets by breaking the source dataset into a few ERDDAP datasets, each with similar, simple data structures. For example, some gridded environmental model datasets can be stored in ERDDAP by putting the sea surface variables ([time][latitude][longitude]) in one ERDDAP dataset, and by putting the variables with altitude ([time][altitude][latitude][longitude]) in another ERDDAP dataset. We know this isn't ideal, but it is necessary to allow ERDDAP to return data in common file formats (some of which are pretty simple).

Another approach to dealing with complex datasets (e.g., for model data using a cubed sphere projection) is to also offer a reprojected version of the dataset ([time][altitude][latitude][longitude]) which ERDDAP can work with easily. These simpler data structures aren't meant to replace the original data structures, but they can be a useful way to distribute the data to a wider audience.
 

How sustainable is the ERDDAP project?
ERDDAP is very sustainable.
Some people are surprised and disappointed to hear that ERDDAP is mostly developed by one person (me, Bob Simons). [By the way, the opinions on this web page are my personal opinions and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration.] They fear that if something happens to me, that will be end of ERDDAP. That is simply not true. ERDDAP's positioning for long-term sustainability is excellent, and close to the best it could possibly be.

Yes, I am the main developer of ERDDAP. I am a fully funded federal employee. My funding isn't "soft" money, so I don't receive or rely on grants. I spend more than half my time developing ERDDAP. The rest of my time is spent managing datasets. That work is useful for ERDDAP because I need to work with real datasets in order to know in detail what ERDDAP needs to do. My bosses fully support my work on ERDDAP because it does what I was hired to do: make it easier for fisheries scientists (primarily, but really everyone) to get scientific data from diverse sources.

The miraculous thing about software is that it costs nothing to duplicate. So to do my job, I write ERDDAP for use at ERD. I think that is the best possible way for me to do my job. That reason alone justifies the expense of developing ERDDAP. (I think it could be shown that ERDDAP has saved more NOAA scientist's time than that I have spent developing ERDDAP. Time=Money.) But the side benefit is that any other organization can download, install, and use ERDDAP for free to distribute their scientific data.

Over 90 organizations in at least 14 countries use ERDDAP. Maybe there is such a thing as a free lunch.

ERDDAP is a Java program. The source code for every version is on GitHub (external link), the most commonly used system for collaborative software projects. So far, these groups/people have contributed code to ERDDAP:

I hope others will contribute code in the future. If something happens to me, my bosses will hire a replacement with the specific goal that s/he continues the development of ERDDAP. Further, I try to write very clean code. I write Java Doc comments. I write comments in the code. I chose variable names carefully. I follow the Java formatting guidelines. All of this is an effort to make the code more readable, for other programmers who want to understand and/or change it, and for me, because, in a year or two, I will have forgotten the details of how and why the code was written the way it was. Clean code with good comments makes my ongoing work on ERDDAP easier, so I have a great incentive to write clean code with good comments.

But all of my answers so far are not very important. Only one thing that is really important. Only one thing guarantees sustainability for ERDDAP or any software project: that ERDDAP is Free and Open Source Software (FOSS) (external link). Specifically, ERDDAP uses Apache-compatible software licenses (external link), so anyone can do anything they want with the code.

Why is that important? One might think that software will be reliably available in the future because a big company is behind it. But Google, for example, has discontinued numerous projects (here's a list (external link)). I don't want to pick on Google because I really like Google and they fund a large number of great, open-source projects. Microsoft has discontinued projects. Apple has discontinued projects. ... The point is that just having the backing of a large company is no assurance that the project will continue. The users of that software are out of luck, unless the software was (and therefore, always is) Free and Open Source Software (FOSS). Then, whenever there is interest by even one developer, the project can and will continue to evolve. FOSS is an insurance policy. In fact, FOSS is the only insurance policy, the only assurance, that matters. FOSS insures that there is always a way forward for the software. That is a right that no one can take away, ever.

One might also think that software that has a large team of developers will be more sustainable than software with one main developer. But lots of developers usually need lots of funding. I know a famous, reasonably large project with 10 developers (I won't embarrass them by naming them) that is in constant serious danger of stopping the project because they don't have enough funding. They rely on grants. They always run a deficit. Their patron has always bailed them out at the last minute, but is getting really tired of bailing them out. So if they can't raise a million dollars a year in grant money (or the patron gets too tired of bailing them out), they'll stop. And the group can't conceive of having fewer than 10 developers. Each developer has a role to play in their group. In light of that, it seems to me that it is a great sign that ERDDAP can be, and is, actively developed by just one main developer (who is fully funded) with the unofficial assistance of a few others. If fact, it would be a bad sign if ERDDAP required multiple developers. That ERDDAP has just one main developer means that it isn't a huge task that requires massive ongoing funding; it is a relatively small task that requires minimal effort and funding. That is more sustainable, not less.

One might think that hiring a contracting company to write software is a good idea. For a fee, they'll provide developers and promise continuity (which is good unless/until they go out of business). But they also have you over a barrel: you must pay them what they request or there is no more development, unless the software is FOSS and you're just paying them to work on the code. With FOSS, you always have choices about how to move forward. Because ERDDAP is FOSS, contractors are always a good option for you or anyone with regard to ERDDAP: if anything happens to me (the one main developer), or if I don't have time to make some change that you want, or I retire and you don't like my replacement's work, you can always hire a contracting company to make the changes you want (or make them yourself).

In summary, ERDDAP has the two sustainability features that matter most:

  1. ERDDAP is a small project (small enough to be handled by one main developer with the unofficial assistance of a few others), so it doesn't require massive resources.
  2. ERDDAP is Free and Open Source Software, so no one can ever stop you or anyone else from working on ERDDAP.
I cannot think of a better situation. I hope that relieves any fears you (or anyone else) had about ERDDAP's sustainability. If you hear people questioning or discouraging the use of ERDDAP because there is just one main developer, please set them straight by pointing them to the above discussion at this URL: https://coastwatch.pfeg.noaa.gov/erddap/information.html#sustainable .
 

How to Cite a Dataset in a Paper
It is important to let readers know how you got the data that you used in your paper. For each dataset that you used, please look at the dataset's metadata in the Dataset Attribute Structure section at the bottom of the .html page for the dataset, e.g.,
https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST41.html . The metadata sometimes includes a required or suggested citation format for the dataset. The "license" metadata sometimes lists restrictions on the use of the data.

To generate a citation for a dataset:
If you think of the dataset as a scientific article, you can generate a citation based on the author (see the "creator_name" or "institution" metadata), the date that you downloaded the data, the title (see the "title" metadata), and the publisher (see the "publisher_name" metadata). If possible, please include the specific URL(s) used to download the data. If the dataset's metadata includes a Digital Object Identifier (DOI) (external link), please include that in the citation you create.
 

How to Cite ERDDAP in a Paper
If you want to cite ERDDAP itself in a scientific paper, please use something like
Simons, R.A. 2020. ERDDAP. https://coastwatch.pfeg.noaa.gov/erddap . Monterey, CA: NOAA/NMFS/SWFSC/ERD.

What does the acronym "ERDDAP" stand for?
"ERDDAP" used to be an acronym, but it outgrew that original description. Now, please just think of it as a name, not an acronym.
 

Guidelines for Data Distribution Systems
Bob's opinions about the design and evaluation of data distribution systems can be found here.
 

You can Set Up Your Own ERDDAP Server and serve your own data.

Contact Us

If you have questions, suggestions, or comments about ERDDAP in general (not this specific ERDDAP installation), please send an email to bob dot simons at noaa dot gov and include the ERDDAP URL directly related to your question or comment.
Or, you can join the ERDDAP Google Group / Mailing List by visiting https://groups.google.com/forum/#!forum/erddap (external link) and clicking on "Apply for membership". Once you are a member, you can post your question there or search to see if the question has already been asked and answered.

DISCLAIMER: The opinions on this web page are Bob Simons' personal opinions and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration.


 
ERDDAP, Version 2.18
Disclaimers | Privacy Policy | Contact