Segment

Program to extract overlapping segments from micrographs

tree

tree

Input: Micrographs

Output: Image output stack

Parameters

Parameter

Example (default)

Description

Micrographs

cs_scan034.tif

Input micrographs: accepted file formats (tif, .mrc, .mrcs, .spi, .hdf, .img, .hed).

Image output stack

protein_stack.hdf

Output stack: accepted file formats (hdf).

Segment coordinates

scan034_boxes.txt

Input: file with identical name of corresponding micrograph (accepted file formats EMAN’s Helixboxer/Boxer, EMAN2’s E2helixboxer and Bsoft filament parameters coordinates: .box, .txt, .star, .db). When using the frame processing please specify a previously generated spring.db to provide the coordinates. Make sure that helix paths are continuous. A helix path can follow a C- or S-path but must NOT form a U-turn.

Segment size in Angstrom

700

Molecular mass (i.e. signal) increases with segment size and helix defects become more pronounced. Final image size = segement size + stepsize (accepted values min=100, max=1500).

Estimated helix width in Angstrom

200

Generous width measure of helix required for rectangular mask (accepted values min=0, max=1500).

Step size of segmentation in Angstrom

70

Overlapping segments are related views according to helical symmetry, i.e. step size should be a multiple of helical rise (stepsize of 0 corresponds to one central box per helix) (accepted values min=0, max=2000).

Pixel size in Angstrom

1.163

Pixel size is an imaging parameter (accepted values min=0.001, max=100).

Invert option

True

Inversion of image densities for cryo data, i.e. protein becomes white.

Sample parameter file

You may run the program in the command line by providing the parameters via a text file:

segment --f parameterfile.txt

Where the format of the parameters is:

Micrographs                              = cs_scan034.tif
Image output stack                       = protein_stack.hdf
Segment coordinates                      = scan034_boxes.txt
Segment size in Angstrom                 = 700
Estimated helix width in Angstrom        = 200
Step size of segmentation in Angstrom    = 70
Pixel size in Angstrom                   = 1.163
Invert option                            = True

Additional parameters (intermediate level)

Parameter

Example (default)

Description

Spring database option

True

If checked will read previous spring.db (Sqlite-compatible database) otherwise will create new one.

spring.db file

spring.db

Program requires a previously generated spring.db and writes an updated spring.db database in the working directory.

Perturb step option

False

Perturb the segmentation step between the windowed segments. Takes specified step size and applies a random shift along the helix between +/- stepsize // 2. This is useful to avoid artifacts in the Fourier transforms of class averages.

CTF correct option

True

Segments are CTF corrected with determined CTF parameters.

CTFFIND or CTFTILT

ctftilt

Choose whether ‘ctffind’ or ‘ctftilt’ values are used for CTF correction.

convolve or phase-flip

convolve

Choose whether to ‘convolve’ or ‘phase-flip’ images with determined CTF.

Binning option

False

Segments are reduced in size by binning.

Binning factor

6

Segments are reduced in size by binning factor (accepted values min=1, max=20).

Normalization option

True

Segments are normalized with a mean of 0 and standard deviation of 1.

Row normalization option

False

Option to normalize micrographs row by row to eliminate artifacts as they occur in Falcon II images or frames if they are not correctly linearized.

Remove helix ends option

False

Ends of helices are removed by half the segment size. This depends on how you boxed the helices.

Rotation option

True

Segments are rotated with helix axis perpendicular to image rows.

MPI option

True

OpenMPI installed (mpirun).

Number of CPUs

8

Number of processors to be used (accepted values min=1, max=1000).

Temporary directory

/tmp

Temporary directory should have fast read and write access.

Sample parameter file (intermediate level)

You may run the program in the command line by providing the parameters via a text file:

segment --f parameterfile.txt

Where the format of the parameters is:

Micrographs                              = cs_scan034.tif
Image output stack                       = protein_stack.hdf
Spring database option                   = True
spring.db file                           = spring.db
Segment coordinates                      = scan034_boxes.txt
Segment size in Angstrom                 = 700
Estimated helix width in Angstrom        = 200
Step size of segmentation in Angstrom    = 70
Perturb step option                      = False
Pixel size in Angstrom                   = 1.163
CTF correct option                       = True
CTFFIND or CTFTILT                       = ctftilt
convolve or phase-flip                   = convolve
Binning option                           = False
Binning factor                           = 6
Invert option                            = True
Normalization option                     = True
Row normalization option                 = False
Remove helix ends option                 = False
Rotation option                          = True
MPI option                               = True
Number of CPUs                           = 8
Temporary directory                      = /tmp

Additional parameters (expert level)

Parameter

Example (default)

Description

Astigmatism correction

True

Option to correct for astigmatism in image otherwise average defocus is used.

Micrographs select option

False

Choose whether to select any particular micrographs.

Include or exclude micrographs

include

Choose whether to ‘include’ or ‘exclude’ specified micrographs.

Micrographs list

1-9, 11, 13

List of comma-separated micrograph ids, e.g. ‘1-10, 12, 14’ (1st micrograph is 1).

Helices select option

False

Choose whether to select any particular helices.

Include or exclude helices

include

Choose whether to ‘include’ or ‘exclude’ specified helices.

Helices list

1-9, 11, 13

List of comma-separated helix ids, e.g. ‘1-10, 12, 14’ (1st helix is 1).

Straightness select option

False

Choose whether to select any helices based on straightness.

Include or exclude straight helices

include

Choose whether to ‘include’ or ‘exclude’ helices of specified persistence length.

Persistence length range

(80, 100)

Range of persistence length in percent, i.e. upper 10 percent of distribution is expressed as 90 - 100 percent range, lower 20 percent is expressed as 0 - 20 percent etc. 90 - 100 % corresponds to most straight helices. Values from database are stored in m, e.g. ‘0-0.0001’ Persistence length is calculated as: p = -ln(2 * (end_to_end_distance / contour_length) ** 2 - 1) / contour_length)), i.e. short persistence lengths of 1 nm correspond to very flexible whereas 1 m corresponds to extremely straight helices. Examples are TMV: 2.9 mm (2.9e-3 m), amyloid beta filaments: 300 microm (3e-4 m) and DNA: 100 nm (1e-7 m). Due to the alignment error of the segments this value may not be absolutely comparable to determined persistence lengths by other methods but still be valid as a relative measure of straightness (accepted values min=0, max=100).

Defocus select option

False

Choose whether to select any segments based on defocus.

Include or exclude defocus range

include

Choose whether to ‘include’ or ‘exclude’ segments of specified defocus.

Defocus range

(10000, 40000)

Range of defocus in Angstrom, e.g. ‘10000-40000’ (accepted values min=0, max=100000).

Astigmatism select option

False

Choose whether to select any segments based on astigmatism.

Include or exclude astigmatic segments

include

Choose whether to ‘include’ or ‘exclude’ segments of specified astigmatism amplitude in Angstrom.

Astigmatism range

(0, 4000)

Range of astigmatism amplitude (difference between defocus one and two) in Angstrom, e.g. ‘0-4000’ (accepted values min=0, max=100000).

Frame processing option

False

This option will prepare of stack containing frame helix segments from direct electron detectors and is intended for subsequent helix-based movie processing using ‘segmentrefine3d’. Prior to this option run ‘segmentrefine3d’ using the combined average of all frames. For input of the ‘Frame processing option’ using ‘segment’ please provide: 1. ‘Micrographs’ as an mrc-stack file 2. ‘Segment coordinates’ - use previous spring.db as input instead of pure coordinate files. 3. ‘spring.db file’ previous spring.db (same file as 2.) and 4. ‘Refinement.db to process’ from your last ‘segmentrefine3d’ cycle. This option will generate the following output: 1. Stack of frame helix segments, 2. spring_frames.db with copies of all segment entries from the previous spring.db and 3. refinement_frames.db with copies of previous orientation parameters. With those output files of the ‘segment’ run you can launch ‘segmentrefine3d’ with ‘Frame motion correction’

First and last frame

(0, 6)

Choose first and last frame to be processed from direct detector movies. Remember, first frame correspond to frame 0 (accepted values min=0, max=400).

Refinement.db to process

refinement.db

Input: refinement.db from previous combined average frame run of segmentrefine3d.

Sample parameter file (expert level)

You may run the program in the command line by providing the parameters via a text file:

segment --f parameterfile.txt

Where the format of the parameters is:

Micrographs                              = cs_scan034.tif
Image output stack                       = protein_stack.hdf
Spring database option                   = True
spring.db file                           = spring.db
Segment coordinates                      = scan034_boxes.txt
Segment size in Angstrom                 = 700
Estimated helix width in Angstrom        = 200
Step size of segmentation in Angstrom    = 70
Perturb step option                      = False
Pixel size in Angstrom                   = 1.163
CTF correct option                       = True
CTFFIND or CTFTILT                       = ctftilt
convolve or phase-flip                   = convolve
Astigmatism correction                   = True
Binning option                           = False
Binning factor                           = 6
Invert option                            = True
Normalization option                     = True
Row normalization option                 = False
Micrographs select option                = False
Include or exclude micrographs           = include
Micrographs list                         = 1-9, 11, 13
Helices select option                    = False
Include or exclude helices               = include
Helices list                             = 1-9, 11, 13
Straightness select option               = False
Include or exclude straight helices      = include
Persistence length range                 = (80, 100)
Defocus select option                    = False
Include or exclude defocus range         = include
Defocus range                            = (10000, 40000)
Astigmatism select option                = False
Include or exclude astigmatic segments   = include
Astigmatism range                        = (0, 4000)
Remove helix ends option                 = False
Rotation option                          = True
Frame processing option                  = False
First and last frame                     = (0, 6)
Refinement.db to process                 = refinement.db
MPI option                               = True
Number of CPUs                           = 8
Temporary directory                      = /tmp

Command line options

When invoking segment, you may specify any of these options:

usage: segment [-h] [--g] [--p] [--f FILENAME] [--c] [--l LOGFILENAME] [--d DIRECTORY_NAME] [--version] [--spring_database_option]
               [--perturb_step_option] [--ctf_correct_option] [--astigmatism_correction] [--binning_option] [--invert_option] [--normalization_option]
               [--row_normalization_option] [--micrographs_select_option] [--helices_select_option] [--straightness_select_option]
               [--defocus_select_option] [--astigmatism_select_option] [--remove_helix_ends_option] [--rotation_option] [--frame_processing_option]
               [--mpi_option]
               [input_output [input_output ...]]

Program to extract overlapping segments from micrographs

positional arguments:
  input_output          Input and output files

optional arguments:
  -h, --help            show this help message and exit
  --g, --GUI            GUI option: read input parameters from GUI
  --p, --promptuser     Prompt user option: read input parameters from prompt
  --f FILENAME, --parameterfile FILENAME
                        File option: read input parameters from FILENAME
  --c, --cmd            Command line parameter option: read only boolean input parameters from command line and all other parameters will be assigned
                        from other sources
  --l LOGFILENAME, --logfile LOGFILENAME
                        Output logfile name as specified
  --d DIRECTORY_NAME, --directory DIRECTORY_NAME
                        Output directory name as specified
  --version             show program's version number and exit
  --spring_database_option, --spr
                        If checked will read previous spring.db (Sqlite-compatible database) otherwise will create new one. (default: False)
  --perturb_step_option, --per
                        Perturb the segmentation step between the windowed segments. Takes specified step size and applies a random shift along the
                        helix between +/- stepsize // 2. This is useful to avoid artifacts in the Fourier transforms of class averages. (default:
                        False)
  --ctf_correct_option, --ctf
                        Segments are CTF corrected with determined CTF parameters. (default: False)
  --astigmatism_correction, --ast
                        Option to correct for astigmatism in image otherwise average defocus is used. (default: False)
  --binning_option, --bin
                        Segments are reduced in size by binning. (default: False)
  --invert_option, --inv
                        Inversion of image densities for cryo data, i.e. protein becomes white. (default: False)
  --normalization_option, --nor
                        Segments are normalized with a mean of 0 and standard deviation of 1. (default: False)
  --row_normalization_option, --row
                        Option to normalize micrographs row by row to eliminate artifacts as they occur in Falcon II images or frames if they are not
                        correctly linearized. (default: False)
  --micrographs_select_option, --mic
                        Choose whether to select any particular micrographs. (default: False)
  --helices_select_option, --hel
                        Choose whether to select any particular helices. (default: False)
  --straightness_select_option, --str
                        Choose whether to select any helices based on straightness. (default: False)
  --defocus_select_option, --def
                        Choose whether to select any segments based on defocus. (default: False)
  --astigmatism_select_option
                        Choose whether to select any segments based on astigmatism. (default: False)
  --remove_helix_ends_option, --rem
                        Ends of helices are removed by half the segment size. This depends on how you boxed the helices. (default: False)
  --rotation_option, --rot
                        Segments are rotated with helix axis perpendicular to image rows. (default: False)
  --frame_processing_option, --fra
                        This option will prepare of stack containing frame helix segments from direct electron detectors and is intended for
                        subsequent helix-based movie processing using 'segmentrefine3d'. Prior to this option run 'segmentrefine3d' using the combined
                        average of all frames. For input of the 'Frame processing option' using 'segment' please provide: 1. 'Micrographs' as an mrc-
                        stack file 2. 'Segment coordinates' - use previous spring.db as input instead of pure coordinate files. 3. 'spring.db file'
                        previous spring.db (same file as 2.) and 4. 'Refinement.db to process' from your last 'segmentrefine3d' cycle. This option
                        will generate the following output: 1. Stack of frame helix segments, 2. spring_frames.db with copies of all segment entries
                        from the previous spring.db and 3. refinement_frames.db with copies of previous orientation parameters. With those output
                        files of the 'segment' run you can launch 'segmentrefine3d' with 'Frame motion correction' (default: False)
  --mpi_option, --mpi   OpenMPI installed (mpirun). (default: False)

Program flow

  1. assign_reorganize: Initialize micrographs and segments to convert them into Spring’s file structure

  2. single_out: Single out individual helices from micrograph

  3. readmic: Loading new micrograph

  4. center_segments: Segments are centerd with respect to helix axis

  5. window_segment: Windowing segments from micrograph