disParity » General Discussion

Calculation Example - Parity Space needed

(6 posts)
  1. dldummy
    Member

    A SIMPLE calculation ... to clarify the extraspace neede for parity

    In this example it dont care about the extra space needed to store by File system (FileSystem management, Blocksize of Filesystem,...)

    And i know tyhat 1 KB is 1024 Byte and not 1000 ... but its simpler to calculate,
    and it dont mather for this example.

    1 TB = 1000 GB = 1 000 000 MB = 1 000 000 000 KB = 1 000 000 000 000 B

    disParity split the files in blocks to xor them together.
    So if the filesize isnt a exact multiplie of the Blocksize you have to add a whole block extra.
    In worst case you nedd a whole block for only 1 Byte more ... the rest is filld up with 0.

    I dont count the extra space neede by disparity to manage parity (Store Filename,Path,Date/Time,Blocknumbers,.....)
    Becous there is no way i can calculate that, it depends on FilenameLength, Pathlength, how the informations are stored (FixLength Fields or not) ...
    So i ignore it and say you need a little bit more than the calculated value

    I dont know the bloksize disparity use but for my example, i use a blocksize of 512 KB (512000B)

    Our experiment take the worst case that every file is 1 bytes bigger than the multiple
    of the Blocksize so you need 512KB mor than the filesize for parity, for each file.

    And we make parity for only one 1TB drive ... in real that would be senseless, but so its easier to calculate,
    and if you have more drives it dont change so much on the way it works.

    the worst case is the simplest.
    You have files with only 1 Byte length ...

    Files: 1 000 000 000 000
    ParitySize: 512 000 000 000 000 000 Byte (512 000 TB)

    Yes that is so much you need 1 Block (512KB) for each of the files ...
    I know that its impossible to do this in real, its only an example of calculation.

    If you have 1 TB DATA full of MP3 or Pictures each 4 MB

    Files: 250 000
    ParitySize: 1 128 000 000 000 Byte (1TB 128GB)

    If you fill the drive wit 1CD Video-RIPs ... i calculate with 700MB

    Files: 1428
    ParitySize: rounded 1 000 731 000 000 Byte (1TB 731MB)

    If you have untouched Rips of your Blurays each on is a 25GB ISO

    Files: 40
    ParitySize: rounded 1 000 020 000 000 Bytes (1TB 20MB)

    If the Blocksize used by disparity is smaller i calculated in this example, you need less, if bigger you need more extraspace.
    And as told at beginning there is more ManagementData to store and so the needed size is bigger as here calculated.
    And in real life its a little more complex becous you have more then 1 data Drive and miexd filesizes.

    But this example can help to get a feeling how the Filesize affects the needed Paritysize.

    Or is my calculation completly wrong ....? What do you think Roland ?

    Posted 7 months ago #
  2. dldummy
    Member

    I found the post where Roland talked about rounding up to the next 64K.
    So Blocksize is 64KB not 512KB so the ParitySizes are smaller then previously calculated.

    Impossible Worst Case Files with 1 Byte: 64 000 TB Rati is 1:64000
    4 MB Mp3/Pictures: 1TB 16GB so Ratio is 1:1,016
    700MB 1CD-RIP: rounded 1TB 91MB Ratio is 1:1,000091
    25GB Untouchd BluRay ISO: rounded 1TB 3MB Ratio is 1:1,000003

    And i know too, that a 1TB drive has less then 1TB free space ...
    And to bee true not only the 1st 1Byte example is impossible in real,
    even if the Harddisk would say that there is 1TB free space you cant store 250 000 4MB files on it. The filesystem needs space too and ... there are other things like blocksize of Filesystem,....

    Posted 7 months ago #
  3. Roland
    Key Master

    You can also estimate the extra parity space needed for a data drive simply this way:

    extra parity = # files * 32K

    (where K is 1024 bytes)

    The reasoning is that the end of every file will land randomly somewhere within the final 64K block. Since this end point will be randomly distributed, on average it will be in the middle of the block, so on average every file will "waste" an additional 32K of parity.

    This is an estimate, not a precise calculation (for that you must look at the size of every file, which is what disParity does during the "pre check") but for drives with large numbers of files it should be pretty accurate.

    The "meta data" that disParity also stores is in files*.dat on your parity drive. You can look at those to get a sense of how much room they take. You are correct it's hard to predict exactly since file paths are variable length. The file name is the only variable length field, everything else is fixed.

    Posted 7 months ago #
  4. dldummy
    Member

    Yesss ....
    If we use statistic its much simpler ....

    Posted 7 months ago #
  5. dldummy
    Member

    Hmmm i re-readed your post specially the MetaData part ....

    Everything else then the path is fixed !?
    Hmmm for recovery dont you need a list of Blocknumbers of a file ?
    I mean somting like FAT do ...

    To know which Parityblocks are part of a file.

    Or am i completly wrong ....

    Hmm if you start to recover at the beginnig and do it file by file you only have to know the number of Blocks a file consits ... and this is one fixed Value ....

    Yesss .... parity is no FAT it is written sequentialy, we have no fragmentation and so there is no need to store some kind of BlockList ...

    Hmm no fragmentation ? .... How do you handle updates ?
    Specially deleted and edited files ...
    There has to be something like Blocklists, and GarbageCollection ....

    If we do parity at Harddisk Blocklevel its easy xor Bolck 1 of HD1 with Bolck1 of HD2 ... it dont mather what data is stored there, and what happen to files.
    But we do it on File level, files are not so static as Harddisk Blocks.
    They suddenly can disapear and than we have a big big hole. ;-)

    Or do i think completly in the wrong direction ?

    Posted 7 months ago #
  6. Roland
    Key Master

    > Hmmm for recovery dont you need a list of Blocknumbers of a file ?

    Files are stored continuously in parity, so I only need to store a starting block #.

    > Hmm no fragmentation ? .... How do you handle updates ?

    Deleted files leave a "hole". Newly added files are placed in an existing hole if one can be found. If not, they go on the end. So yes, over time with lots of deletes and adds your parity can become fragmented. This has been discussed before in previous threads on the forum.

    Posted 7 months ago #

RSS feed for this topic

Reply

You must log in to post.