openide/docs/jar-format.puml

@startuml
!include jb-plantuml-theme.puml

skinparam linetype ortho

top to bottom direction

header
A [[https://en.wikipedia.org/wiki/ZIP_(file_format) ZIP]] file format with optimized metadata.
endheader

component "File Entry 1" as FE1
component "File Entry N" as FE2

note right of FE1
The relative offset of the local file header does not point directly to the data,
but rather to the header itself. This means that you need to perform two seeks
in order to locate the actual data, as the size of the local file header can vary.

As an optimization, you can attempt to precompute the data offset
when reading the central directory file header.
This optimization is implemented in the HashMapZipFile class.
However, ImmutableZipFile uses a special index for this purpose, as explained below.
end note

FE1 -- FE2

component "File entry ~__index__" as INDEX {
  component "A list of keys along with their corresponding offsets and sizes." as INDEX_M
  note right of INDEX_M
    A list of pairs consisting of long values.
    Each pair includes a key, represented as a 64-bit XXH3 hash of an entry name,
    and an offset and size represented as two ints packed into a single long value.
    This list enables the retrieval of data locations for all entries in a single bulk read operation.
    It contains no file names or other unnecessary metadata.
  end note

  component "class package hashes" as INDEX_PC
  note right of INDEX_PC
    A list of long values representing the 64-bit XXH3 hash of a package name.
    This list is not used by the ZipFile implementation but is consumed by the class loader.
    It allows for a quick determination of whether a class name is located within a ZIP file or not.
    While it does not provide much benefit for a single ZIP file, as name lookup can be done with a single map lookup,
    it enables the clustering of multiple ZIP files.
    This clustering helps avoid a linear search across all ZIP files in a classpath.
  end note

  component "resource package hashes" as INDEX_PR
  note right of INDEX_PR
    The same concept applies to resource package hashes.
    However, there are two different sets of hashes since there is no correlation
    between class packages and resource packages.
  end note

  component names  {
    component "name lengths" as INDEX_NL
    note right of INDEX_NL
      A list of name lengths represented as shorts.
      This list enables the reading of integers in a single bulk read operation,
      directly from native memory.
    end note

    component "names" as INDEX_NS
    note right of INDEX_NS
      List of strings.
    end note

    INDEX_NL -down- INDEX_NS
  }

  note bottom of names
    Entry names. They are not loaded into memory when the ZipFile is opened;
    instead, they are loaded only when requested.
    This is useful, for instance, when you want to process entries based on their names,
    such as finding entries by a specific prefix.
  end note

  INDEX_M -- INDEX_PC
  INDEX_PC -- INDEX_PR
  INDEX_PR -- names
}

note top of INDEX
The Zip specification is not violated.
The index data represents a regular file entry.
end note

FE2 -- INDEX

component "Central directory" as CD
note right of CD
  The index format version is stored in the 'File comment' field.
  Only the latest format is supported.
  If a ZIP file does not have a comment or the index version is not equal to the latest,
  a fallback implementation is used that is capable of reading any ZIP file.
end note
INDEX -- CD

@enduml