Debian FTPMaster Wiki / projects / data

data.debian.org

Original Project Description and Rationale

Subject: Large data packages in the archive

Hi,

one important question lately has been "What should we do with large packages containing data", like game data, huge icon/wallpaper sets, some science data sets, etc. Naturally, this is a decision ftpmaster has to take, so here are our thoughts on it. So here are a few thoughts to facilitate discussion and see if we missed important points but we keep the right to have the last word here. :)

Basic Problem: "What to do with large data packages?"

That already has a problem: How to define "large"? One way, which we chose for now, is simply "everything > 50MB".

While the archive software is written in Python, this problem sounds like a Perl one as "There is more than one way to do (solve) it":

a.) We can simply say that we don't want this in Debian and people should use external hosting for such packages. After all they are for a very small minority usually.

b.) We can just add another component "data" besides main/contrib/non-free.

c.) We can host an own archive for it under control of ftpmaster.

The first two seem to have grave problems:

a.) Is basically no (good) option. It is our job to maintain the archive, and if there is enough demand we should make it possible to also host things like these data packages. Additionally it has the problem that it would require a move of everything that needs those data packages into contrib, as there wouldn't be a good base for a Policy exception.

b.) While that would be the most simple solution it has other problems, large enough that we decided against it. The biggest one being that of the principle of least surprise for our mirrors. We are talkin about this to not bloat the main archive too much. If we just add another component stuff will end up mirrored a lot. Even if we send an announcement weeks before. Requiring every mirror admin to take a decision if they want to mirror or exclude it, then adjust their scripts, is a simple no-go for us.

So the way to go for us seems to be c.), hosting the archive somewhere below data.debian.org probably.

For all the rest of the mail I talk about solution c., unless otherwise stated.

So assume we go for solution c. (which is what happens unless someone has a very strong reason not to, which I currently can't imagine) we will setup a seperate archive for this. This will work the same way as our main archive does, with a few notable points:

Any comments?

Timeframe for this? I expect it to be ready within 2 weeks.

-- bye Joerg