Michelixtracegrid

Program to optimize michelixtrace by varying michelixtrace parameters systematically on a grid

tree

tree

Input: Micrographs

Output: Diagnostic plot pattern

Parameters

Parameter

Example (default)

Description

Micrographs

cs_scan034.tif

Input micrographs: accepted file formats (tif, .mrc, .mrcs, .spi, .hdf, .img, .hed).

Diagnostic plot pattern

michelixtracegrid_diag.pdf

If single input micrograph: name of diagnostic plot file. In case of multiple input micrographs suffix to be attached to corresponding input micrograph. Output: accepted file formats (pdf, .png, .bmp, .emf, .eps, .gif, .jpeg, .jpg, .ps, .raw, .rgba, .svg, .svgz, .tif, .tiff).

First parameter

alpha_threshold

Choose parameter to be varied in first dimension: ; ‘tile_size_power’; ‘tile_overlap’; ‘binning_factor’; ‘alpha_threshold’; ‘min_helix_length’; ‘max_helix_length’; ‘order_fit’;

Second parameter

min_helix_length

Choose parameter to be varied in second dimension: ; ‘none’; ‘tile_size_power’; ‘tile_overlap’; ‘binning_factor’; ‘alpha_threshold’; ‘min_helix_length’; ‘max_helix_length’; ‘order_fit’;

Lower and upper limit first parameter

(1.4, 1.9)

Lower and upper limit of first parameter for grid search. Unit dependent on quantity (accepted values min=-1e+07, max=1e+07).

Lower and upper limit second parameter

(22.0, 24.0)

Lower and upper limit of second parameter for grid search. Unit dependent on quantity (accepted values min=-1e+07, max=1e+07).

First and second parameter increment

(0.1, 0.3)

First and second parameter increment for grid search. Unit dependent on quantity (accepted values min=-1e+08, max=1e+08).

Helix reference

helix_reference.hdf

Helix reference: long rectangular straight box of helix to be traced. accepted file formats (spi, .hdf, .img, .hed).

Estimated helix width in Angstrom

200

Generous width measure of helix required for rectangular mask (accepted values min=0, max=1500).

Pixel size in Angstrom

1.163

Pixel size is an imaging parameter (accepted values min=0.001, max=100).

Sample parameter file

You may run the program in the command line by providing the parameters via a text file:

michelixtracegrid --f parameterfile.txt

Where the format of the parameters is:

Micrographs                              = cs_scan034.tif
Diagnostic plot pattern                  = michelixtracegrid_diag.pdf
First parameter                          = alpha_threshold
Second parameter                         = min_helix_length
Lower and upper limit first parameter    = (1.4, 1.9)
Lower and upper limit second parameter   = (22.0, 24.0)
First and second parameter increment     = (0.1, 0.3)
Helix reference                          = helix_reference.hdf
Estimated helix width in Angstrom        = 200
Pixel size in Angstrom                   = 1.163

Additional parameters (intermediate level)

Parameter

Example (default)

Description

Binning option

True

Micrograph is reduced in size by binning.

Binning factor

4

Micrograph is reduced in size by binning factor (accepted values min=1, max=20).

MPI option

True

OpenMPI installed (mpirun).

Number of CPUs

2

Number of processors to be used. Maximum number corresponds directly to number of input scans, i.e. no gain in performance if single input micrograph chosen (accepted values min=1, max=300).

Temporary directory

/tmp

Temporary directory should have fast read and write access.

Sample parameter file (intermediate level)

You may run the program in the command line by providing the parameters via a text file:

michelixtracegrid --f parameterfile.txt

Where the format of the parameters is:

Micrographs                              = cs_scan034.tif
Diagnostic plot pattern                  = michelixtracegrid_diag.pdf
First parameter                          = alpha_threshold
Second parameter                         = min_helix_length
Lower and upper limit first parameter    = (1.4, 1.9)
Lower and upper limit second parameter   = (22.0, 24.0)
First and second parameter increment     = (0.1, 0.3)
Helix reference                          = helix_reference.hdf
Estimated helix width in Angstrom        = 200
Pixel size in Angstrom                   = 1.163
Binning option                           = True
Binning factor                           = 4
MPI option                               = True
Number of CPUs                           = 2
Temporary directory                      = /tmp

Additional parameters (expert level)

Parameter

Example (default)

Description

Subgrid option

False

Run subgrids to parallelize expensive grid searches.

Part and number of subgrids

(1, 3)

E.g. one out of total of three subgrids will be run. This features is thought for parallelization of expensive grid searches (accepted values min=1, max=100).

Grid continue option

False

Continue grid refinement in case of interrupted grid searches.

Grid database

grid.db

Continue grid refinement in case of interrupted grid searches.

Invert option

False

Inversion of contrast of reference, e.g. when using inverted class-averageReference must have same contrast than the micrograph, e.g. protein requires to be black in micrograph as well as reference.

Tile size power spectrum in Angstrom

500

Tile size to be used for analysis (accepted values min=1, max=10000).

Tile overlap in percent

80

Overlap influences degree of averaging (accepted values min=0, max=90).

Alpha threshold cc-map

0.001

Parameter for adaptive thresholding of CC-map:The significance of cross correlation values in the micrograph will be judged by how extreme values compare to an exponential null hypothesis.The corresponding p-values are considered significant if below significance level alpha. Lower this value in orders of magnitude if helix tracing too promiscuous (accepted values min=0, max=1).

Absolute threshold option cc-map

False

If True, then adaptive thresholding using Alpha threhold will not be used. Instead, absolute CC-value can be defined using Absolute threshold parameter.

Absolute threshold cc-map

0.2

Absolute CC threshold to regard pixel in CC-map as helix. Can only be used if Absolute threshold option is on (accepted values min=0, max=10).

Order fit

2

Order of polynomial fit the coordinates of detected helix (1=linear, 2=quadratic, 3=cubic …). Can be used as a further restraint (accepted values min=1, max=19).

Minimum and maximum helix length

(500, 1500)

Sets the minimum and maximum allowed helix length in Angstrom. Too short values can lead to contaminations being recognized as helices Too large values can be too stringent, especially for overlapping or highly bent helices. Longer helices will be split in half. Maximum helix length is recommended to be at least double of minimum helix length. (accepted values min=100, max=7000).

Pruning cutoff bending

2.0

Outlier helices that are too bent or kinked are removed in this pruning step. The distribution of persistence length measures is analyzed once a population of more than 100 helices have been detected. The pruning cutoff determines how many standard deviations (estimated by MAD) the persistence length is allowed to be below the median of the distribution. Diagnostic output file “PersistenceLength.pdf” is generated. Values between 1 and 3 are recommended (accepted values min=0, max=10).

Box file coordinate step

70.0

If resulting box files are to be used in another software, step size in Anstrombetween coordinates can be set here. Leave unchanged for subjequent usage withinSPRING, since this can be adjusted in the SPRING program #segment seperately (accepted values min=1, max=500).

Compute performance score

False

Option to compute measures of tracing performance based on recall, precision F1-measure, F05-measure by comparison of traced with provided ground truth helices.

Parameter search option

False

If True, tracing is run with multiple parameter pairs of Alpha threshold and Minimum helix length cutoff to determine optimum parameter set. The grid search will output a ParameterSpace.pdf file.

Manually traced helix file

mic.box

Interactively traced helix file considered to be the ground truth in for parameter search. Input: file with identical name of corresponding micrograph (accepted file formats EMAN’s Helixboxer/Boxer, EMAN2’s E2helixboxer and Bsoft filament parameters coordinates: .box, .txt). Make sure that helix paths are continuous. A helix path can follow a C- or S-path but must NOT form a U-turn.

Sample parameter file (expert level)

You may run the program in the command line by providing the parameters via a text file:

michelixtracegrid --f parameterfile.txt

Where the format of the parameters is:

Micrographs                              = cs_scan034.tif
Diagnostic plot pattern                  = michelixtracegrid_diag.pdf
First parameter                          = alpha_threshold
Second parameter                         = min_helix_length
Lower and upper limit first parameter    = (1.4, 1.9)
Lower and upper limit second parameter   = (22.0, 24.0)
First and second parameter increment     = (0.1, 0.3)
Subgrid option                           = False
Part and number of subgrids              = (1, 3)
Grid continue option                     = False
Grid database                            = grid.db
Helix reference                          = helix_reference.hdf
Invert option                            = False
Estimated helix width in Angstrom        = 200
Pixel size in Angstrom                   = 1.163
Binning option                           = True
Binning factor                           = 4
Tile size power spectrum in Angstrom     = 500
Tile overlap in percent                  = 80
Alpha threshold cc-map                   = 0.001
Absolute threshold option cc-map         = False
Absolute threshold cc-map                = 0.2
Order fit                                = 2
Minimum and maximum helix length         = (500, 1500)
Pruning cutoff bending                   = 2.0
Box file coordinate step                 = 70.0
Compute performance score                = False
Parameter search option                  = False
Manually traced helix file               = mic.box
MPI option                               = True
Number of CPUs                           = 2
Temporary directory                      = /tmp

Command line options

When invoking michelixtracegrid, you may specify any of these options:

usage: michelixtracegrid [-h] [--g] [--p] [--f FILENAME] [--c] [--l LOGFILENAME] [--d DIRECTORY_NAME] [--version] [--subgrid_option]
                         [--grid_continue_option] [--invert_option] [--binning_option] [--absolute_threshold_option_cc-map]
                         [--compute_performance_score] [--parameter_search_option] [--mpi_option]
                         [input_output [input_output ...]]

Program to optimize michelixtrace by varying michelixtrace parameters systematically on a grid

positional arguments:
  input_output          Input and output files

optional arguments:
  -h, --help            show this help message and exit
  --g, --GUI            GUI option: read input parameters from GUI
  --p, --promptuser     Prompt user option: read input parameters from prompt
  --f FILENAME, --parameterfile FILENAME
                        File option: read input parameters from FILENAME
  --c, --cmd            Command line parameter option: read only boolean input parameters from command line and all other parameters will be assigned
                        from other sources
  --l LOGFILENAME, --logfile LOGFILENAME
                        Output logfile name as specified
  --d DIRECTORY_NAME, --directory DIRECTORY_NAME
                        Output directory name as specified
  --version             show program's version number and exit
  --subgrid_option, --sub
                        Run subgrids to parallelize expensive grid searches. (default: False)
  --grid_continue_option, --gri
                        Continue grid refinement in case of interrupted grid searches. (default: False)
  --invert_option, --inv
                        Inversion of contrast of reference, e.g. when using inverted class-averageReference must have same contrast than the
                        micrograph, e.g. protein requires to be black in micrograph as well as reference. (default: False)
  --binning_option, --bin
                        Micrograph is reduced in size by binning. (default: False)
  --absolute_threshold_option_cc-map, --abs
                        If True, then adaptive thresholding using Alpha threhold will not be used. Instead, absolute CC-value can be defined using
                        Absolute threshold parameter. (default: False)
  --compute_performance_score, --com
                        Option to compute measures of tracing performance based on recall, precision F1-measure, F05-measure by comparison of traced
                        with provided ground truth helices. (default: False)
  --parameter_search_option, --par
                        If True, tracing is run with multiple parameter pairs of Alpha threshold and Minimum helix length cutoff to determine optimum
                        parameter set. The grid search will output a ParameterSpace.pdf file. (default: False)
  --mpi_option, --mpi   OpenMPI installed (mpirun). (default: False)

Program flow

  1. orient_reference_power_with_overlapping_powers: Find orientations of by matching power spectra.

  2. find_translations_by_cc: Find translations by cross-correlation

  3. perform_connected_component_analysis: Extract individual helices by connected component analysis.

  4. build_cc_image_of_helices: Compute fine map of helix localisation

  5. visualize_traces_in_diagnostic_plot: Generate diagnostic plot