wiki:LibPSIO
Last modified 3 years ago Last modified on 03/11/11 02:09:24

PSIO Introduction

In PSI3, PSIO existed as a set of functions in namespace psi. In PSI4, PSIO is currently consists of an OOP version of the old PSIO functions (in the class PSIO), with the supporting PSIOManager and AIOHandler wrappers. To maintain compatibility with many PSI3 modules, the set of PSI3 PSIO function calls has been temporarily retained, though these calls are now wrappers which point to a static shared_ptr<PSIO> object known as _default_psio_lib_.

The PSIOManager Class

The PSIOManager class is a static class that tracks the open/close/namespace changes of all PSIO instances. PSIOManager provides a number of psiclean options, and also allows for single files or sets of filenumbers to be placed in different paths. Additionally, new modules can check the conformation of their file handling by printing the contents of the PSIOManager immediately after the new module finishes its computations. If any files remain open at this point, the module is nonconforming.

Example usage (not run with .psi4rc, see below for that):

molecule h2o {
    O
    H 1 0.97
    H 1 0.97 2 103.0
}

globals {
    wfn rhf
    jobtype opt
    docc [3, 0, 1, 1]
    basis cc-pVTZ
    convergence 11
    e_converge 11
    d_converge 11
}

# Get a copy of the psio manager
psioh = PsiMod.IOManager.shared_object()

# Paths should be set before the calculation starts
# Technically you can be tricky, but it can really hurt

# Set the default path
psioh.set_default_path('./')

# Set the path for a specific filenumber
psioh.set_specific_path(33, '/tmp/')

# Choose to save all File 32's for later use
# Filenumber retentions should be called before their
# files are created (like a static member)
psioh.set_specific_retention(32, True)

# Run mints (makes files 33 [TEIs] and 35 [OEIs])
mints()

# See what's there
print_out('After Mints\n')
psioh.print_out()

# Run PK-based rhf (makes files 32 [Chkpt] and 64 [DIIS])
energy('rhf')

# Choose to save this particular File 35 for later use
# Unfortunately you need the full path at the moment
# Particular file retentions should be called after their
# files are created (like an object)
psioh.mark_file_for_retention('./psi.h2o.35', True)

# See what's there
print_out('After RHF\n')
psioh.print_out()

# Call the new psiclean
psioh.psiclean()

# See what's there
print_out('After Psiclean\n')
psioh.print_out()

Result (outfile, infile, timer file, ./psi.h2o.32, and ./psi.h2o.35 are the only files left at the end). Output file (relevant bits):

# After MINTS

                    --------------------------------
                    ==> PSI4 Current File Status <==
                    --------------------------------

  Default Path: ./

  Specific File Paths:

  FileNo Path
  ----------------------------------------------------------------------
  33     /tmp/

  Specific File Retentions:

  FileNo
  -------
  32

  Current File Retention Rules:

  Filename
  --------------------------------------------------

  Current Files:

  Filename                                          Status   Fate
  ----------------------------------------------------------------------
  ./psi.h2o.35                                      CLOSED   DEREZZ
  /tmp/psi.h2o.33                                   CLOSED   DEREZZ


# After SCF
                    --------------------------------
                    ==> PSI4 Current File Status <==
                    --------------------------------

  Default Path: ./

  Specific File Paths:

  FileNo Path
  ----------------------------------------------------------------------
  33     /tmp/

  Specific File Retentions:

  FileNo
  -------
  32

  Current File Retention Rules:

  Filename
  --------------------------------------------------
  ./psi.h2o.32
  ./psi.h2o.35

  Current Files:

  Filename                                          Status   Fate
  ----------------------------------------------------------------------
  ./psi.h2o.32                                      CLOSED   SAVE
  ./psi.h2o.35                                      CLOSED   SAVE
  ./psi.h2o.64                                      CLOSED   DEREZZ
  /tmp/psi.h2o.33                                   CLOSED   DEREZZ

# After Psiclean
                    --------------------------------
                    ==> PSI4 Current File Status <==
                    --------------------------------

  Default Path: ./

  Specific File Paths:

  FileNo Path
  ----------------------------------------------------------------------
  33     /tmp/

  Specific File Retentions:

  FileNo
  -------
  32

  Current File Retention Rules:

  Filename
  --------------------------------------------------
  ./psi.h2o.32
  ./psi.h2o.35

  Current Files:

  Filename                                          Status   Fate
  ----------------------------------------------------------------------
  ./psi.h2o.32                                      CLOSED   SAVE
  ./psi.h2o.35                                      CLOSED   SAVE


The .psi4rc File

Defaults for the PSIOManager object may be set in the new .psi4rc configuration file. A .psi4rc file is a sliver of valid PSI4 Python code placed in the user's home directory (much like a .tcshrc or .bashrc). The strip of code in a .psi4rc is placed in the PSI4-converted python input immediately after the standard import commands, and immediately before the contents of the input file are included. A .psi4rc can do many things, such as initialize physical constants or build standard conversion tables, but one critical use for it is the specification of file location and retention characteristics. A commented example is provided below:

/home/parrish/.psi4rc

# Get a pointer to the default PsiMod object
psioh = PsiMod.IOManager.shared_object()

# Set the default path for all PSI4 data files to '/scratch/parrish/'
# be sure to include the trailing '/'
psioh.set_default_path('/scratch/parrish/')

# Set the path for files of a specific type
# This keeps the chkpt on the NFS, while the heavies go to scratch
psioh.set_specific_path(32, './')

# Set a specific file number to be retained where it lies
# and not be removed by psiclean
psioh.set_specific_retention(32, True)

# Set a common constant
H2KCAL = 627.509

Psiclean

The old brute-force psiclean script has been replaced with functionality in the PSIOManager object. The preferred way to use the new psiclean is to call either clean() or psioh.psiclean at the inside a PSI4 input (where psioh is a Python reference to the PSIOManager static object). Using the set_specific_retention and mark_file_for_retention methods shown in the above sections (together with their overrides), you can dynamically control what files are retained after psiclean runs. This may be used to call psiclean mid-job to remove some large scratch files that the producing libraries did not clean.

Of course, the new psiclean object would ordinarily be unable to operate if a job died midway through, as the memory the file tree was kept on would be destroyed. However, the PSIOManager stores the files marked for destruction in an ASCII file named psi.clean in the same directory the calculation was launched from. In normal operation, you will not see much of this file, as it is deleted upon successful a psiclean. In the event that a job dies, however, calling psi4 -w in the same dir will run only a special crashclean method, which builds the PSIOManager from the psi.clean file, executes psiclean and promptly dies. This method respects the same rules as an ordinary psiclean, as only ready-destroy files are stored. Calling psi4 -w in a dir with no psi.clean file results in an error.

The PSIO Class

The PSIO class is a window to the disk, and is best understood by example usage:

#include <psio.hpp> // Contains the classes
#include <psio.h> // Contains the utility methods and old PSI3 methods
#include <psifiles.h> // Header in $PSI4/include that maps file numbers to names
...
// Get a PSIO object. ALL new codes should use the constructor, not the _default_psio_lib_ object 
shared_ptr<PSIO> psio(new PSIO());

// Open a new file (PSIF_FILEID is a #define of type int corresponding to the purpose of your file, and should be in psifiles.h)
psio->open(PSIF_FILEID, PSIO_OPEN_NEW);

// Close the file, but save it for later use
psio->close(PSIF_FILEID, 1); // 1 means save

// Don't touch the file's data here, it is closed. You can see if it is open by:
int open = psio->open_check(PSIF_FILEID);
assert(open == 0);

// Open an old file (the NEW/OLD flags are in psio.h)
psio->open(PSIF_FILEID, PSIO_OPEN_OLD);

// Now it is open
open = psio->open_check(PSIF_FILEID);
assert(open != 0);

// Write a whole entry of data
int size = 100;
double* data = init_array(size);
... // fill your data
// Here's the write (notice that the size of the operation is in bytes)
psio->write_entry(PSIF_FILEID, "My Data Name", (void*) &data[0], size*sizeof(double)); 

// Read and write operations are always symmetric in signature
psio->read_entry(PSIF_FILEID, "My Data Name", (void*) &data[0], size*sizeof(double));

// If you want to read or write a block of the entry, you need the more low-level read/write methods
// which involve providing a starting psio_address into the disk entry.
// This example places the first ten elements of data into the second ten elements of PSIF_FILEID
// Note that you can continue to expand the last entry of a partially written file, but the
// blocks must expand continuously. For random write access, you have to pre-fill the entry 
psio_address addr = psio_get_address(PSIO_ZERO, 10*sizeof(double)); // psioaddress is a special struct in psio.h
psio->write(PSIF_FILEID, "My Data Name", (void*) &data[0], 10*sizeof(double), addr, &addr);

// Always close your files before exiting the module 
psio->close(PSIF_FILEID, 0); // 0 means delete the file

// Free your data too
delete[] data;

Note: Do not open or close files in module object constructors/destructors. The pointer to a module object is often retained until the end of a computation, which prevents a file from one module from being closed before open is called in the computation efforts of the next module. This is in keeping with the PSI4 principle that memory and file pointers should all be acquired right before computation, and released right after.

The AIOHandler Class

The AIOHandler class is a wrapper that allows for nonblocking disk operations via boost threads (build on pthreads). An example of using an AIOHandler is:

#include <psio.hpp>
...
shared_ptr<PSIO> psio; 

// Initialize psio, open your file, etc

// Build a buffer to read into
double* your_buffer = new double[your_size];

// Build an AIOHandler for this psio 
shared_ptr<AIOHandler> aio(new AIOHandler(psio));

// Post the nonblocking IO operation (read, write, read_entry, and write_entry are supported, exactly like PSIO)
// This is where psio->read(...) used to go
aio->read_entry(PSIF_YOUR_FILEID, "YOUR DATA", (char*) your_buffer, your_size*sizeof(double));

// At this point aio is reading the entry, but has returned without blocking

// So you do some other work while it's reading
do_some_serious_work(); 

// And now, you need the data in your_buffer, so you synchronize the aio object
aio->synchronize();

// And now the contents of your_buffer are guaranteed to have been read in 
double important_number = your_buffer[0] + your_buffer[your_size - 1];

delete[] your_buffer;

NOTE: The AIOHandler object may not be called more than once before synchronizing, as this would launch two threads with handles to the same non-threadsafe PSIO object. Moreover, calling the PSIO object while an unsynchronized AIO is running will probably do very bad things. If sufficient demand arises, we can modify the AIOhandler to support a dynamic queue so that multiple files or entries may be requested up front and then synchronized to a certain point later.