gnuastro (0.22)
This is gnuastro.info, produced by makeinfo version 7.1 from
gnuastro.texi.
This book documents version 0.22 of the GNU Astronomy Utilities
(Gnuastro). Gnuastro provides various programs and libraries for
astronomical data manipulation and analysis.
Copyright © 2015-2024 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, no Front-Cover Texts, and
no Back-Cover Texts. A copy of the license is included in the
section entitled "GNU Free Documentation License".
INFO-DIR-SECTION Astronomy
START-INFO-DIR-ENTRY
* Gnuastro: (gnuastro). GNU Astronomy Utilities.
* libgnuastro: (gnuastro)Gnuastro library. Full Gnuastro library doc.
* help-gnuastro: (gnuastro)help-gnuastro mailing list. Getting help.
* bug-gnuastro: (gnuastro)Report a bug. How to report bugs
* Arithmetic: (gnuastro)Arithmetic. Arithmetic operations on pixels.
* astarithmetic: (gnuastro)Invoking astarithmetic. Options to Arithmetic.
* BuildProgram: (gnuastro)BuildProgram. Compile and run programs using Gnuastro's library.
* astbuildprog: (gnuastro)Invoking astbuildprog. Options to BuildProgram.
* ConvertType: (gnuastro)ConvertType. Convert different file types.
* astconvertt: (gnuastro)Invoking astconvertt. Options to ConvertType.
* Convolve: (gnuastro)Convolve. Convolve an input file with kernel.
* astconvolve: (gnuastro)Invoking astconvolve. Options to Convolve.
* CosmicCalculator: (gnuastro)CosmicCalculator. For cosmological params.
* astcosmiccal: (gnuastro)Invoking astcosmiccal. Options to CosmicCalculator.
* Crop: (gnuastro)Crop. Crop region(s) from image(s).
* astcrop: (gnuastro)Invoking astcrop. Options to Crop.
* Fits: (gnuastro)Fits. View and manipulate FITS extensions and keywords.
* astfits: (gnuastro)Invoking astfits. Options to Fits.
* MakeCatalog: (gnuastro)MakeCatalog. Make a catalog from labeled image.
* astmkcatalog: (gnuastro)Invoking astmkcatalog. Options to MakeCatalog.
* MakeProfiles: (gnuastro)MakeProfiles. Make mock profiles.
* astmkprof: (gnuastro)Invoking astmkprof. Options to MakeProfiles.
* Match: (gnuastro)Match. Match two separate catalogs.
* astmatch: (gnuastro)Invoking astmatch. Options to Match.
* NoiseChisel: (gnuastro)NoiseChisel. Detect signal in noise.
* astnoisechisel: (gnuastro)Invoking astnoisechisel. Options to NoiseChisel.
* Segment: (gnuastro)Segment. Segment detections based on signal structure.
* astsegment: (gnuastro)Invoking astsegment. Options to Segment.
* Query: (gnuastro)Query. Access remote databases for downloading data.
* astquery: (gnuastro)Invoking astquery. Options to Query.
* Statistics: (gnuastro)Statistics. Get image Statistics.
* aststatistics: (gnuastro)Invoking aststatistics. Options to Statistics.
* Table: (gnuastro)Table. Read and write FITS binary or ASCII tables.
* asttable: (gnuastro)Invoking asttable. Options to Table.
* Warp: (gnuastro)Warp. Warp a dataset to a new grid.
* astwarp: (gnuastro)Invoking astwarp. Options to Warp.
* astscript: (gnuastro)Installed scripts. Gnuastro's installed scripts.
* astscript-ds9-region: (gnuastro)Invoking astscript-ds9-region. Options to this script
* astscript-fits-view: (gnuastro)Invoking astscript-fits-view. Options to this script
* astscript-pointing-simulate: (gnuastro)Invoking astscript-pointing-simulate. Options to this script
* astscript-psf-scale-factor: (gnuastro)Invoking astscript-psf-scale-factor. Options to this script
* astscript-psf-select-stars: (gnuastro)Invoking astscript-psf-select-stars. Options to this script
* astscript-psf-stamp: (gnuastro)Invoking astscript-psf-stamp. Options to this script
* astscript-psf-subtract: (gnuastro)Invoking astscript-psf-subtract. Options to this script
* astscript-psf-unite: (gnuastro)Invoking astscript-psf-unite. Options to this script
* astscript-radial-profile: (gnuastro)Invoking astscript-radial-profile. Options to this script
* astscript-sort-by-night: (gnuastro)Invoking astscript-sort-by-night. Options to this script
* astscript-zeropoint: (gnuastro)Invoking astscript-zeropoint. Options to this script
END-INFO-DIR-ENTRY
File: gnuastro.info, Node: --usage, Next: --help, Prev: Getting help, Up: Getting help
4.3.1 ‘--usage’
---------------
If you give this option, the program will not run. It will only print a
very concise message showing the options and arguments. Everything
within square brackets (‘[]’) is optional. For example, here are the
first and last two lines of Crop's ‘--usage’ is shown:
$ astcrop --usage
Usage: astcrop [-Do?IPqSVW] [-d INT] [-h INT] [-r INT] [-w INT]
[-x INT] [-y INT] [-c INT] [-p STR] [-N INT] [--deccol=INT]
....
[--setusrconf] [--usage] [--version] [--wcsmode]
[ASCIIcatalog] FITSimage(s).fits
There are no explanations on the options, just their short and long
names shown separately. After the program name, the short format of all
the options that do not require a value (on/off options) is displayed.
Those that do require a value then follow in separate brackets, each
displaying the format of the input they want, see *note Options::.
Since all options are optional, they are shown in square brackets, but
arguments can also be optional. For example, in this example, a catalog
name is optional and is only required in some modes. This is a standard
method of displaying optional arguments for all GNU software.
File: gnuastro.info, Node: --help, Next: Man pages, Prev: --usage, Up: Getting help
4.3.2 ‘--help’
--------------
If the command-line includes this option, the program will not be run.
It will print a complete list of all available options along with a
short explanation. The options are also grouped by their context.
Within each context, the options are sorted alphabetically. Since the
options are shown in detail afterwards, the first line of the ‘--help’
output shows the arguments and if they are optional or not, similar to
*note --usage::.
In the ‘--help’ output of all programs in Gnuastro, the options for
each program are classified based on context. The first two contexts
are always options to do with the input and output respectively. For
example, input image extensions or supplementary input files for the
inputs. The last class of options is also fixed in all of Gnuastro, it
shows operating mode options. Most of these options are already
explained in *note Operating mode options::.
The help message will sometimes be longer than the vertical size of
your terminal. If you are using a graphical user interface terminal
emulator, you can scroll the terminal with your mouse, but we promised
no mice distractions! So here are some suggestions:
• <Shift + PageUP> to scroll up and <Shift + PageDown> to scroll
down. For most help output this should be enough. The problem is
that it is limited by the number of lines that your terminal keeps
in memory and that you cannot scroll by lines, only by whole
screens.
• Pipe to ‘less’. A pipe is a form of shell re-direction. The
‘less’ tool in Unix-like systems was made exactly for such outputs
of any length. You can pipe (‘|’) the output of any program that
is longer than the screen to it and then you can scroll through (up
and down) with its many tools. For example:
$ astnoisechisel --help | less
Once you have gone through the text, you can quit ‘less’ by
pressing the <q> key.
• Redirect to a file. This is a less convenient way, because you
will then have to open the file in a text editor! You can do this
with the shell redirection tool (‘>’):
$ astnoisechisel --help > filename.txt
In case you have a special keyword you are looking for in the help,
you do not have to go through the full list. GNU Grep is made for this
job. For example, if you only want the list of options whose ‘--help’
output contains the word "axis" in Crop, you can run the following
command:
$ astcrop --help | grep axis
If the output of this option does not fit nicely within the confines
of your terminal, GNU does enable you to customize its output through
the environment variable ‘ARGP_HELP_FMT’, you can set various parameters
which specify the formatting of the help messages. For example, if your
terminals are wider than 70 spaces (say 100) and you feel there is too
much empty space between the long options and the short explanation, you
can change these formats by giving values to this environment variable
before running the program with the ‘--help’ output. You can define
this environment variable in this manner:
$ export ARGP_HELP_FMT=rmargin=100,opt-doc-col=20
This will affect all GNU programs using GNU C library's ‘argp.h’
facilities as long as the environment variable is in memory. You can
see the full list of these formatting parameters in the "Argp User
Customization" part of the GNU C library manual. If you are more
comfortable to read the ‘--help’ outputs of all GNU software in your
customized format, you can add your customization (similar to the line
above, without the ‘$’ sign) to your ‘~/.bashrc’ file. This is a
standard option for all GNU software.
File: gnuastro.info, Node: Man pages, Next: Info, Prev: --help, Up: Getting help
4.3.3 Man pages
---------------
Man pages were the Unix method of providing command-line documentation
to a program. With GNU Info, see *note Info:: the usage of this method
of documentation is highly discouraged. This is because Info provides a
much more easier to navigate and read environment.
However, some operating systems require a man page for packages that
are installed and some people are still used to this method of
command-line help. So the programs in Gnuastro also have Man pages
which are automatically generated from the outputs of ‘--version’ and
‘--help’ using the GNU help2man program. So if you run
$ man programname
You will be provided with a man page listing the options in the standard
manner.
File: gnuastro.info, Node: Info, Next: help-gnuastro mailing list, Prev: Man pages, Up: Getting help
4.3.4 Info
----------
Info is the standard documentation format for all GNU software. It is a
very useful command-line document viewing format, fully equipped with
links between the various pages and menus and search capabilities. As
explained before, the best thing about it is that it is available for
you the moment you need to refresh your memory on any command-line tool
in the middle of your work without having to take your hands off the
keyboard. This complete book is available in Info format and can be
accessed from anywhere on the command-line.
To open the Info format of any installed programs or library on your
system which has an Info format book, you can simply run the command
below (change ‘executablename’ to the executable name of the program or
library):
$ info executablename
In case you are not already familiar with it, run ‘$ info info’. It
does a fantastic job in explaining all its capabilities itself. It is
very short and you will become sufficiently fluent in about half an
hour. Since all GNU software documentation is also provided in Info,
your whole GNU/Linux life will significantly improve.
Once you've become an efficient navigator in Info, you can go to any
part of this book or any other GNU software or library manual, no matter
how long it is, in a matter of seconds. It also blends nicely with GNU
Emacs (a text editor) and you can search manuals while you are writing
your document or programs without taking your hands off the keyboard,
this is most useful for libraries like the GNU C library. To be able to
access all the Info manuals installed in your GNU/Linux within Emacs,
type <Ctrl-H + i>.
To see this whole book from the beginning in Info, you can run
$ info gnuastro
If you run Info with the particular program executable name, for example
‘astcrop’ or ‘astnoisechisel’:
$ info astprogramname
you will be taken to the section titled "Invoking ProgramName" which
explains the inputs and outputs along with the command-line options for
that program. Finally, if you run Info with the official program name,
for example, Crop or NoiseChisel:
$ info ProgramName
you will be taken to the top section which introduces the program. Note
that in all cases, Info is not case sensitive.
File: gnuastro.info, Node: help-gnuastro mailing list, Prev: Info, Up: Getting help
4.3.5 help-gnuastro mailing list
--------------------------------
Gnuastro maintains the help-gnuastro mailing list for users to ask any
questions related to Gnuastro. The experienced Gnuastro users and some
of its developers are subscribed to this mailing list and your email
will be sent to them immediately. However, when contacting this mailing
list please have in mind that they are possibly very busy and might not
be able to answer immediately.
To ask a question from this mailing list, send a mail to
‘help-gnuastro@gnu.org’. Anyone can view the mailing list archives at
<http://lists.gnu.org/archive/html/help-gnuastro/>. It is best that
before sending a mail, you search the archives to see if anyone has
asked a question similar to yours. If you want to make a suggestion or
report a bug, please do not send a mail to this mailing list. We have
other mailing lists and tools for those purposes, see *note Report a
bug:: or *note Suggest new feature::.
File: gnuastro.info, Node: Multi-threaded operations, Next: Numeric data types, Prev: Getting help, Up: Common program behavior
4.4 Multi-threaded operations
=============================
Some of the programs benefit significantly when you use all the threads
your computer's CPU has to offer to your operating system. The number
of threads available can be larger than the number of physical
(hardware) cores in the CPU (also known as Simultaneous multithreading).
For example, in Intel's CPUs (those that implement its Hyper-threading
technology) the number of threads is usually double the number of
physical cores in your CPU. On a GNU/Linux system, the number of threads
available can be found with the command ‘$ nproc’ command (part of GNU
Coreutils).
Gnuastro's programs can find the number of threads available to your
system internally at run-time (when you execute the program). However,
if a value is given to the ‘--numthreads’ option, the given number will
be used, see *note Operating mode options:: and *note Configuration
files:: for ways to use this option. Thus ‘--numthreads’ is the only
common option in Gnuastro's programs with a value that does not have to
be specified anywhere on the command-line or in the configuration files.
* Menu:
* A note on threads:: Caution and suggestion on using threads.
* How to run simultaneous operations:: How to run things simultaneously.
File: gnuastro.info, Node: A note on threads, Next: How to run simultaneous operations, Prev: Multi-threaded operations, Up: Multi-threaded operations
4.4.1 A note on threads
-----------------------
Spinning off threads is not necessarily the most efficient way to run an
application. Creating a new thread is not a cheap operation for the
operating system. It is most useful when the input data are fixed and
you want the same operation to be done on parts of it. For example, one
input image to Crop and multiple crops from various parts of it. In
this fashion, the image is loaded into memory once, all the crops are
divided between the number of threads internally and each thread cuts
out those parts which are assigned to it from the same image. On the
other hand, if you have multiple images and you want to crop the same
region(s) out of all of them, it is much more efficient to set
‘--numthreads=1’ (so no threads spin off) and run Crop multiple times
simultaneously, see *note How to run simultaneous operations::.
You can check the boost in speed by first running a program on one of
the data sets with the maximum number of threads and another time (with
everything else the same) and only using one thread. You will notice
that the wall-clock time (reported by most programs at their end) in the
former is longer than the latter divided by number of physical CPU cores
(not threads) available to your operating system. Asymptotically these
two times can be equal (most of the time they are not). So limiting the
programs to use only one thread and running them independently on the
number of available threads will be more efficient.
Note that the operating system keeps a cache of recently processed
data, so usually, the second time you process an identical data set
(independent of the number of threads used), you will get faster
results. In order to make an unbiased comparison, you have to first
clean the system's cache with the following command between the two
runs.
$ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
*SUMMARY: Should I use multiple threads?* Depends:
• If you only have *one* data set (image in most cases!), then yes,
the more threads you use (with a maximum of the number of threads
available to your OS) the faster you will get your results.
• If you want to run the same operation on *multiple* data sets, it
is best to set the number of threads to 1 and use Make, or GNU
Parallel, as explained in *note How to run simultaneous
operations::.
File: gnuastro.info, Node: How to run simultaneous operations, Prev: A note on threads, Up: Multi-threaded operations
4.4.2 How to run simultaneous operations
----------------------------------------
There are two(1) approaches to simultaneously execute a program: using
GNU Parallel or Make (GNU Make is the most common implementation). The
first is very useful when you only want to do one job multiple times and
want to get back to your work without actually keeping the command you
ran. The second is usually for more important operations, with lots of
dependencies between the different products (for example, a full
scientific research).
GNU Parallel
When you only want to run multiple instances of a command on
different threads and get on with the rest of your work, the best
method is to use GNU parallel. Surprisingly GNU Parallel is one of
the few GNU packages that has no Info documentation but only a Man
page, see *note Info::. So to see the documentation after
installing it please run
$ man parallel
As an example, let's assume we want to crop a region fixed on the
pixels (500, 600) with the default width from all the FITS images
in the ‘./data’ directory ending with ‘sci.fits’ to the current
directory. To do this, you can run:
$ parallel astcrop --numthreads=1 --xc=500 --yc=600 ::: \
./data/*sci.fits
GNU Parallel can help in many more conditions, this is one of the
simplest, see the man page for lots of other examples. For
absolute beginners: the backslash (‘\’) is only a line breaker to
fit nicely in the page. If you type the whole command in one line,
you should remove it.
Make
Make is a program for building "targets" (e.g., files) using
"recipes" (a set of operations) when their known "prerequisites"
(other files) have been updated. It elegantly allows you to define
dependency structures for building your final output and updating
it efficiently when the inputs change. It is the most common
infra-structure to build software today.
Scientific research methodology is very similar to software
development: you start by testing a hypothesis on a small sample of
objects/targets with a simple set of steps. As you are able to get
promising results, you improve the method and use it on a larger,
more general, sample. In the process, you will confront many
issues that have to be corrected (bugs in software development
jargon). Make is a wonderful tool to manage this style of
development.
Besides the raw data analysis pipeline, Make has been used to for
producing reproducible papers, for example, see the reproduction
pipeline (https://gitlab.com/makhlaghi/NoiseChisel-paper) of the
paper introducing *note NoiseChisel:: (one of Gnuastro's programs).
In fact the NoiseChisel paper's Make-based workflow was the
foundation of a parallel project called Maneage
(http://maneage.org) (_Man_aging data lin_eage_):
<http://maneage.org> that is described more fully in Akhlaghi et
al. 2021 (https://arxiv.org/abs/2006.03018). Therefore, it is a
very useful tool for complex scientific workflows.
GNU Make(2) is the most common implementation which (similar to
nearly all GNU programs, comes with a wonderful manual(3)). Make
is very basic and simple, and thus the manual is short (the most
important parts are in the first roughly 100 pages) and easy to
read/understand.
Make comes with a ‘--jobs’ (‘-j’) option which allows you to
specify the maximum number of jobs that can be done simultaneously.
For example, if you have 8 threads available to your operating
system. You can run:
$ make -j8
With this command, Make will process your ‘Makefile’ and create all
the targets (can be thousands of FITS images for example)
simultaneously on 8 threads, while fully respecting their
dependencies (only building a file/target when its prerequisites
are successfully built). Make is thus strongly recommended for
managing scientific research where robustness, archiving,
reproducibility and speed(4) are important.
---------- Footnotes ----------
(1) A third way would be to open multiple terminal emulator windows
in your GUI, type the commands separately on each and press <Enter> once
on each terminal, but this is far too frustrating, tedious and prone to
errors. It's therefore not a realistic solution when tens, hundreds or
thousands of operations (your research targets, multiplied by the
operations you do on each) are to be done.
(2) <https://www.gnu.org/software/make/>
(3) <https://www.gnu.org/software/make/manual/>
(4) Besides its multi-threaded capabilities, Make will only rebuild
those targets that depend on a change you have made, not the whole work.
For example, if you have set the prerequisites properly, you can easily
test the changing of a parameter on your paper's results without having
to re-do everything (which is much faster). This allows you to be much
more productive in easily checking various ideas/assumptions of the
different stages of your research and thus produce a more robust result
for your exciting science.
File: gnuastro.info, Node: Numeric data types, Next: Memory management, Prev: Multi-threaded operations, Up: Common program behavior
4.5 Numeric data types
======================
At the lowest level, the computer stores everything in terms of ‘1’ or
‘0’. For example, each program in Gnuastro, or each astronomical image
you take with the telescope is actually a string of millions of these
zeros and ones. The space required to keep a zero or one is the
smallest unit of storage, and is known as a _bit_. However,
understanding and manipulating this string of bits is extremely hard for
most people. Therefore, different standards are defined to package the
bits into separate _type_s with a fixed interpretation of the bits in
each package.
To store numbers, the most basic standard/type is for integers ($...,
-2, -1, 0, 1, 2, ...$). The common integer types are 8, 16, 32, and 64
bits wide (more bits will give larger limits). Each bit corresponds to
a power of 2 and they are summed to create the final number. In the
integer types, for each width there are two standards for reading the
bits: signed and unsigned. In the 'signed' convention, one bit is
reserved for the sign (stating that the integer is positive or
negative). The 'unsigned' integers use that bit in the actual number
and thus contain only positive numbers (starting from zero).
Therefore, at the same number of bits, both signed and unsigned
integers can allow the same number of integers, but the positive limit
of the ‘unsigned’ types is double their ‘signed’ counterparts with the
same width (at the expense of not having negative numbers). When the
context of your work does not involve negative numbers (for example,
counting, where negative is not defined), it is best to use the
‘unsigned’ types. For the full numerical range of all integer types,
see below.
Another standard of converting a given number of bits to numbers is
the floating point standard, this standard can _approximately_ store any
real number with a given precision. There are two common floating point
types: 32-bit and 64-bit, for single and double precision floating point
numbers respectively. The former is sufficient for data with less than
8 significant decimal digits (most astronomical data), while the latter
is good for less than 16 significant decimal digits. The representation
of real numbers as bits is much more complex than integers. If you are
interested to learn more about it, you can start with the Wikipedia
article (https://en.wikipedia.org/wiki/Floating_point).
Practically, you can use Gnuastro's Arithmetic program to
convert/change the type of an image/datacube (see *note Arithmetic::),
or Gnuastro Table program to convert a table column's data type (see
*note Column arithmetic::). Conversion of a dataset's type is necessary
in some contexts. For example, the program/library, that you intend to
feed the data into, only accepts floating point values, but you have an
integer image/column. Another situation that conversion can be helpful
is when you know that your data only has values that fit within ‘int8’
or ‘uint16’. However it is currently formatted in the ‘float64’ type.
The important thing to consider is that operations involving wider,
floating point, or signed types can be significantly slower than
smaller-width, integer, or unsigned types respectively. Note that
besides speed, a wider type also requires much more storage space (by 4
or 8 times). Therefore, when you confront such situations that can be
optimized and want to store/archive/transfer the data, it is best to use
the most efficient type. For example, if your dataset (image or table
column) only has positive integers less than 65535, store it as an
unsigned 16-bit integer for faster processing, faster transfer, and less
storage space.
The short and long names for the recognized numeric data types in
Gnuastro are listed below. Both short and long names can be used when
you want to specify a type. For example, as a value to the common
option ‘--type’ (see *note Input output options::), or in the
information comment lines of *note Gnuastro text table format::. The
ranges listed below are inclusive.
‘u8’
‘uint8’
8-bit unsigned integers, range:
$[0\rm{\ to\ }2^8-1]$ or $[0\rm{\ to\ }255]$.
‘i8’
‘int8’
8-bit signed integers, range:
$[-2^7\rm{\ to\ }2^7-1]$ or $[-128\rm{\ to\ }127]$.
‘u16’
‘uint16’
16-bit unsigned integers, range:
$[0\rm{\ to\ }2^{16}-1]$ or $[0\rm{\ to\ }65535]$.
‘i16’
‘int16’
16-bit signed integers, range:
$[-2^{15}\rm{\ to\ }2^{15}-1]$ or $[-32768\rm{\ to\ }32767]$.
‘u32’
‘uint32’
32-bit unsigned integers, range:
$[0\rm{\ to\ }2^{32}-1]$ or $[0\rm{\ to\ }4294967295]$.
‘i32’
‘int32’
32-bit signed integers, range:
$[-2^{31}\rm{\ to\ }2^{31}-1]$ or $[-2147483648\rm{\ to\
}2147483647]$.
‘u64’
‘uint64’
64-bit unsigned integers, range
$[0\rm{\ to\ }2^{64}-1]$ or $[0\rm{\ to\ }18446744073709551615]$.
‘i64’
‘int64’
64-bit signed integers, range:
$[-2^{63}\rm{\ to\ }2^{63}-1]$ or $[-9223372036854775808\rm{\ to\
}9223372036854775807]$.
‘f32’
‘float32’
32-bit (single-precision) floating point types. The maximum
(minimum is its negative) possible value is
$3.402823\times10^{38}$. Single-precision floating points can
accurately represent a floating point number up to $\sim7.2$
significant decimals. Given the heavy noise in astronomical data,
this is usually more than sufficient for storing results. For
more, see *note Printing floating point numbers::.
‘f64’
‘float64’
64-bit (double-precision) floating point types. The maximum
(minimum is its negative) possible value is $\sim10^{308}$.
Double-precision floating points can accurately represent a
floating point number $\sim15.9$ significant decimals. This is
usually good for processing (mixing) the data internally, for
example, a sum of single precision data (and later storing the
result as ‘float32’). For more, see *note Printing floating point
numbers::.
*Some file formats do not recognize all types.* for example, the FITS
standard (see *note Fits::) does not define ‘uint64’ in binary tables or
images. When a type is not acceptable for output into a given file
format, the respective Gnuastro program or library will let you know and
abort. On the command-line, you can convert the numerical type of an
image, or table column into another type with *note Arithmetic:: or
*note Table:: respectively. If you are writing your own program, you
can use the ‘gal_data_copy_to_new_type()’ function in Gnuastro's
library, see *note Copying datasets::.
File: gnuastro.info, Node: Memory management, Next: Tables, Prev: Numeric data types, Up: Common program behavior
4.6 Memory management
=====================
In this section we will review how Gnuastro manages your input data in
your system's memory. Knowing this can help you optimize your usage (in
speed and memory consumption) when the data volume is large and
approaches, or exceeds, your available RAM (usually in various calls to
multiple programs simultaneously). But before diving into the details,
let's have a short basic introduction to memory in general and in
particular the types of memory most relevant to this discussion.
Input datasets (that are later fed into programs for analysis) are
commonly first stored in _non-volatile memory_. This is a type of
memory that does not need a constant power supply to keep the data and
is therefore primarily aimed for long-term storage, like HDDs or SSDs.
So data in this type of storage is preserved when you turn off your
computer. But by its nature, non-volatile memory is much slower, in
reading or writing, than the speeds that CPUs can process the data.
Thus relying on this type of memory alone would create a bad bottleneck
in the input/output (I/O) phase of any processing.
The first step to decrease this bottleneck is to have a faster
storage space, but with a much limited storage volume. For this type of
storage, computers have a Random Access Memory (or RAM). RAM is
classified as a _volatile memory_ because it needs a constant flow of
electricity to keep the information. In other words, the moment power
is cut-off, all the stored information in your RAM is gone (hence the
"volatile" name). But thanks to that constant supply of power, it can
access any random address with equal (and very high!) speed.
Hence, the general/simplistic way that programs deal with memory is
the following (this is general to almost all programs, not just
Gnuastro's): 1) Load/copy the input data from the non-volatile memory
into RAM. 2) Use the copy of the data in RAM as input for all the
internal processing as well as the intermediate data that is necessary
during the processing. 3) Finally, when the analysis is complete, write
the final output data back into non-volatile memory, and free/delete all
the used space in the RAM (the initial copy and all the intermediate
data). Usually the RAM is most important for the data of the
intermediate steps (that you never see as a user of a program!).
When the input dataset(s) to a program are small (compared to the
available space in your system's RAM at the moment it is run) Gnuastro's
programs and libraries follow the standard series of steps above. The
only exception is that deleting the intermediate data is not only done
at the end of the program. As soon as an intermediate dataset is no
longer necessary for the next internal steps, the space it occupied is
deleted/freed. This allows Gnuastro programs to minimize their usage of
your system's RAM over the full running time.
The situation gets complicated when the datasets are large (compared
to your available RAM when the program is run). For example, if a
dataset is half the size of your system's available RAM, and the
program's internal analysis needs three or more intermediately processed
copies of it at one moment in its analysis. There will not be enough
RAM to keep those higher-level intermediate data. In such cases,
programs that do not do any memory management will crash. But
fortunately Gnuastro's programs do have a memory management plans for
such situations.
When the necessary amount of space for an intermediate dataset cannot
be allocated in the RAM, Gnuastro's programs will not use the RAM at
all. They will use the "memory-mapped file" concept in modern operating
systems to create a randomly-named file in your non-volatile memory and
use that instead of the RAM. That file will have the exact size (in
bytes) of that intermediate dataset. Any time the program needs that
intermediate dataset, the operating system will directly go to that
file, and bypass your RAM. As soon as that file is no longer necessary
for the analysis, it will be deleted. But as mentioned above,
non-volatile memory has much slower I/O speed than the RAM. Hence in
such situations, the programs will become noticeably slower (sometimes
by factors of 10 times slower, depending on your non-volatile memory
speed).
Because of the drop in I/O speed (and thus the speed of your running
program), the moment that any to-be-allocated dataset is memory-mapped,
Gnuastro's programs and libraries will notify you with a descriptive
statement like below (can happen in any phase of their analysis). It
shows the location of the memory-mapped file, its size, complemented
with a small description of the cause, a pointer to this section of the
book for more information on how to deal with it (if necessary), and
what to do to suppress it.
astarithmetic: ./gnuastro_mmap/Fu7Dhs: temporary memory-mapped file
(XXXXXXXXXXX bytes) created for intermediate data that is not stored
in RAM (see the "Memory management" section of Gnuastro's manual for
optimizing your project's memory management, and thus speed). To
disable this warning, please use the option '--quiet-mmap'
Finally, when the intermediate dataset is no longer necessary, the
program will automatically delete it and notify you with a statement
like this:
astarithmetic: ./gnuastro_mmap/Fu7Dhs: deleted
To disable these messages, you can run the program with ‘--quietmmap’,
or set the ‘quietmmap’ variable in the allocating library function to be
non-zero.
An important component of these messages is the name of the
memory-mapped file. Knowing that the file has been deleted is important
for the user if the program crashes for any reason: internally (for
example, a parameter is given wrongly) or externally (for example, you
mistakenly kill the running job). In the event of a crash, the
memory-mapped files will not be deleted and you have to manually delete
them because they are usually large and they may soon fill your full
storage if not deleted in a long time due to successive crashes.
This brings us to managing the memory-mapped files in your
non-volatile memory. In other words: knowing where they are saved, or
intentionally placing them in different places of your file system, or
deleting them when necessary. As the examples above show, memory-mapped
files are stored in a sub-directory of the running directory called
‘gnuastro_mmap’. If this directory does not exist, Gnuastro will
automatically create it when memory mapping becomes necessary.
Alternatively, it may happen that the ‘gnuastro_mmap’ sub-directory
exists and is not writable, or it cannot be created. In such cases, the
memory-mapped file for each dataset will be created in the running
directory with a ‘gnuastro_mmap_’ prefix.
Therefore one easy way to delete all memory-mapped files in case of a
crash, is to delete everything within the sub-directory (first command
below), or all files stating with this prefix:
rm -f gnuastro_mmap/*
rm -f gnuastro_mmap_*
A much more common issue when dealing with memory-mapped files is
their location. For example, you may be running a program in a
partition that is hosted by an HDD. But you also have another partition
on an SSD (which has much faster I/O). So you want your memory-mapped
files to be created in the SSD to speed up your processing. In this
scenario, you want your project source directory to only contain your
plain-text scripts and you want your project's built products (even the
temporary memory-mapped files) to be built in a different location
because they are large; thus I/O speed becomes important.
To host the memory-mapped files in another location (with fast I/O),
you can set (‘gnuastro_mmap’) to be a symbolic link to it. For example,
let's assume you want your memory-mapped files to be stored in
‘/path/to/dir/for/mmap’. All you have to do is to run the following
command before your Gnuastro analysis command(s).
ln -s /path/to/dir/for/mmap gnuastro_mmap
The programs will delete a memory-mapped file when it is no longer
needed, but they will not delete the ‘gnuastro_mmap’ directory that
hosts them. So if your project involves many Gnuastro programs
(possibly called in parallel) and you want your memory-mapped files to
be in a different location, you just have to make the symbolic link
above once at the start, and all the programs will use it if necessary.
Another memory-management scenario that may happen is this: you do
not want a Gnuastro program to allocate internal datasets in the RAM at
all. For example, the speed of your Gnuastro-related project does not
matter at that moment, and you have higher-priority jobs that are being
run at the same time which need to have RAM available. In such cases,
you can use the ‘--minmapsize’ option that is available in all Gnuastro
programs (see *note Processing options::). Any intermediate dataset
that has a size larger than the value of this option will be
memory-mapped, even if there is space available in your RAM. For
example, if you want any dataset larger than 100 megabytes to be
memory-mapped, use ‘--minmapsize=100000000’ (8 zeros!).
You should not set the value of ‘--minmapsize’ to be too small,
otherwise even small intermediate values (that are usually very
numerous) in the program will be memory-mapped. However the kernel can
only host a limited number of memory-mapped files at every moment (by
all running programs combined). For example, in the default(1) Linux
kernel on GNU/Linux operating systems this limit is roughly 64000. If
the total number of memory-mapped files exceeds this number, all the
programs using them will crash. Gnuastro's programs will warn you if
your given value is too small and may cause a problem later.
Actually, the default behavior for Gnuastro's programs (to only use
memory-mapped files when there is not enough RAM) is a side-effect of
‘--minmapsize’. The pre-defined value to this option is an extremely
large value in the lowest-level Gnuastro configuration file (the
installed ‘gnuastro.conf’ described in *note Configuration file
precedence::). This value is larger than the largest possible available
RAM. You can check by running any Gnuastro program with a ‘-P’ option.
Because no dataset will be larger than this, by default the programs
will first attempt to use the RAM for temporary storage. But if writing
in the RAM fails (for any reason, mainly due to lack of available
space), then a memory-mapped file will be created.
---------- Footnotes ----------
(1) If you need to host more memory-mapped files at one moment, you
need to build your own customized Linux kernel.
File: gnuastro.info, Node: Tables, Next: Tessellation, Prev: Memory management, Up: Common program behavior
4.7 Tables
==========
"A table is a collection of related data held in a structured format
within a database. It consists of columns, and rows." (from
Wikipedia). Each column in the table contains the values of one
property and each row is a collection of properties (columns) for one
target object. For example, let's assume you have just ran MakeCatalog
(see *note MakeCatalog::) on an image to measure some properties for the
labeled regions (which might be detected galaxies for example) in the
image. For each labeled region (detected galaxy), there will be a _row_
which groups its measured properties as _columns_, one column for each
property. One such property can be the object's magnitude, which is the
sum of pixels with that label, or its center can be defined as the
light-weighted average value of those pixels. Many such properties can
be derived from the raw pixel values and their position, see *note
Invoking astmkcatalog:: for a long list.
As a summary, for each labeled region (or, galaxy) we have one _row_
and for each measured property we have one _column_. This high-level
structure is usually the first step for higher-level analysis, for
example, finding the stellar mass or photometric redshift from
magnitudes in multiple colors. Thus, tables are not just outputs of
programs, in fact it is much more common for tables to be inputs of
programs. For example, to make a mock galaxy image, you need to feed in
the properties of each galaxy into *note MakeProfiles:: for it do the
inverse of the process above and make a simulated image from a catalog,
see *note Sufi simulates a detection::. In other cases, you can feed a
table into *note Crop:: and it will crop out regions centered on the
positions within the table, see *note Reddest clumps cutouts and
parallelization::. So to end this relatively long introduction, tables
play a very important role in astronomy, or generally all branches of
data analysis.
In *note Recognized table formats:: the currently recognized table
formats in Gnuastro are discussed. You can use any of these tables as
input or ask for them to be built as output. The most common type of
table format is a simple plain text file with each row on one line and
columns separated by white space characters, this format is easy to
read/write by eye/hand. To give it the full functionality of more
specific table types like the FITS tables, Gnuastro has a special
convention which you can use to give each column a name, type, unit, and
comments, while still being readable by other plain text table readers.
This convention is described in *note Gnuastro text table format::.
When tables are input to a program, the program reading it needs to
know which column(s) it should use for its desired purposes. Gnuastro's
programs all follow a similar convention, on the way you can select
columns in a table. They are thoroughly discussed in *note Selecting
table columns::.
* Menu:
* Recognized table formats:: Table formats that are recognized in Gnuastro.
* Gnuastro text table format:: Gnuastro's convention plain text tables.
* Selecting table columns:: Identify/select certain columns from a table
File: gnuastro.info, Node: Recognized table formats, Next: Gnuastro text table format, Prev: Tables, Up: Tables
4.7.1 Recognized table formats
------------------------------
The list of table formats that Gnuastro can currently read from and
write to are described below. Each has their own advantage and
disadvantages, so a short review of the format is also provided to help
you make the best choice based on how you want to define your input
tables or later use your output tables.
Plain text table
This is the most basic and simplest way to create, view, or edit
the table by hand on a text editor. The other formats described
below are less eye-friendly and have a more formal structure (for
easier computer readability). It is fully described in *note
Gnuastro text table format::.
FITS ASCII tables
The FITS ASCII table extension is fully in ASCII encoding and thus
easily readable on any text editor (assuming it is the only
extension in the FITS file). If the FITS file also contains binary
extensions (for example, an image or binary table extensions), then
there will be many hard to print characters. The FITS ASCII format
does not have new line characters to separate rows. In the FITS
ASCII table standard, each row is defined as a fixed number of
characters (value to the ‘NAXIS1’ keyword), so to visually inspect
it properly, you would have to adjust your text editor's width to
this value. All columns start at given character positions and
have a fixed width (number of characters).
Numbers in a FITS ASCII table are printed into ASCII format, they
are not in binary (that the CPU uses). Hence, they can take a
larger space in memory, loose their precision, and take longer to
read into memory. If you are dealing with integer type columns
(see *note Numeric data types::), another issue with FITS ASCII
tables is that the type information for the column will be lost
(there is only one integer type in FITS ASCII tables). One problem
with the binary format on the other hand is that it is not portable
(different CPUs/compilers) have different standards for translating
the zeros and ones. But since ASCII characters are defined on a
byte and are well recognized, they are better for portability on
those various systems. Gnuastro's plain text table format
described below is much more portable and easier to
read/write/interpret by humans manually.
Generally, as the name implies, this format is useful for when your
table mainly contains ASCII columns (for example, file names, or
descriptions). They can be useful when you need to include columns
with structured ASCII information along with other extensions in
one FITS file. In such cases, you can also consider header
keywords (see *note Fits::).
FITS binary tables
The FITS binary table is the FITS standard's solution to the issues
discussed with keeping numbers in ASCII format as described under
the FITS ASCII table title above. Only columns defined as a string
type (a string of ASCII characters) are readable in a text editor.
The portability problem with binary formats discussed above is
mostly solved thanks to the portability of CFITSIO (see *note
CFITSIO::) and the very long history of the FITS format which has
been widely used since the 1970s.
In the case of most numbers, storing them in binary format is more
memory efficient than ASCII format. For example, to store
‘-25.72034’ in ASCII format, you need 9 bytes/characters. But if
you keep this same number (to the approximate precision possible)
as a 4-byte (32-bit) floating point number, you can keep/transmit
it with less than half the amount of memory. When catalogs contain
thousands/millions of rows in tens/hundreds of columns, this can
lead to significant improvements in memory/band-width usage.
Moreover, since the CPU does its operations in the binary formats,
reading the table in and writing it out is also much faster than an
ASCII table.
When you are dealing with integer numbers, the compression ratio
can be even better, for example, if you know all of the values in a
column are positive and less than ‘255’, you can use the ‘unsigned
char’ type which only takes one byte! If they are between ‘-128’
and ‘127’, then you can use the (signed) ‘char’ type. So if you
are thoughtful about the limits of your integer columns, you can
greatly reduce the size of your file and also the speed at which it
is read/written. This can be very useful when sharing your results
with collaborators or publishing them. To decrease the file size
even more you can name your output as ending in ‘.fits.gz’ so it is
also compressed after creation. Just note that
compression/decompressing is CPU intensive and can slow down the
writing/reading of the file.
Fortunately the FITS Binary table format also accepts ASCII strings
as column types (along with the various numerical types). So your
dataset can also contain non-numerical columns.
* Menu:
* Gnuastro text table format:: Reading plain text tables
File: gnuastro.info, Node: Gnuastro text table format, Next: Selecting table columns, Prev: Recognized table formats, Up: Tables
4.7.2 Gnuastro text table format
--------------------------------
Plain text files are the most generic, portable, and easiest way to
(manually) create, (visually) inspect, or (manually) edit a table. In
this format, the ending of a row is defined by the new-line character (a
line on a text editor). So when you view it on a text editor, every row
will occupy one line. The delimiters (or characters separating the
columns) are white space characters (space, horizontal tab, vertical
tab) and a comma (<,>). The only further requirement is that all
rows/lines must have the same number of columns.
The columns do not have to be exactly under each other and the rows
can be arbitrarily long with different lengths. For example, the
following contents in a file would be interpreted as a table with 4
columns and 2 rows, with each element interpreted as a 64-bit floating
point type (see *note Numeric data types::).
1 2.234948 128 39.8923e8
2 , 4.454 792 72.98348e7
However, the example above has no other information about the columns
(it is just raw data, with no meta-data). To use this table, you have
to remember what the numbers in each column represent. Also, when you
want to select columns, you have to count their position within the
table. This can become frustrating and prone to bad errors (getting the
columns wrong in your scientific project!) especially as the number of
columns increase. It is also bad for sending to a colleague, because
they will find it hard to remember/use the columns properly.
To solve these problems in Gnuastro's programs/libraries you are not
limited to using the column's number, see *note Selecting table
columns::. If the columns have names, units, or comments you can also
select your columns based on searches/matches in these fields, for
example, see *note Table::. Also, in this manner, you cannot guide the
program reading the table on how to read the numbers. As an example,
the first and third columns above can be read as integer types: the
first column might be an ID and the third can be the number of pixels an
object occupies in an image. So there is no need to read these to
columns as a 64-bit floating point type (which takes more memory, and is
slower).
In the bare-minimum example above, you also cannot use strings of
characters, for example, the names of filters, or some other identifier
that includes non-numerical characters. In the absence of any
information, only numbers can be read robustly. Assuming we read
columns with non-numerical characters as string, there would still be
the problem that the strings might contain space (or any delimiter)
character for some rows. So, each 'word' in the string will be
interpreted as a column and the program will abort with an error that
the rows do not have the same number of columns.
To correct for these limitations, Gnuastro defines the following
convention for storing the table meta-data along with the raw data in
one plain text file. The format is primarily designed for ease of
reading/writing by eye/fingers, but is also structured enough to be read
by a program.
When the first non-white character in a line is <#>, or there are no
non-white characters in it, then the line will not be considered as a
row of data in the table (this is a pretty standard convention in many
programs, and higher level languages). In the first case (when the
first character of the line is <#>), the line is interpreted as a
_comment_.
If the comment line starts with '‘# Column N:’', then it is assumed
to contain information about column ‘N’ (a number, counting from 1).
Comment lines that do not start with this pattern are ignored and you
can use them to include any further information you want to store with
the table in the text file. The most generic column information comment
line has the following format:
# Column N: NAME [UNIT, TYPE(NUM), BLANK] COMMENT
Any sequence of characters between '<:>' and '<[>' will be interpreted
as the column name (so it can contain anything except the '<[>'
character). Anything between the '<]>' and the end of the line is
defined as a comment. Within the brackets, anything before the first
'<,>' is the units (physical units, for example, km/s, or erg/s),
anything before the second '<,>' is the short type identifier (see
below, and *note Numeric data types::).
If the type identifier is not recognized, the default 64-bit floating
point type will be used. The type identifier can optionally be followed
by an integer within parenthesis. If the parenthesis is present and the
integer is larger than 1, the column is assumed to be a "vector column"
(which can have multiple values, for more see *note Vector columns::).
Finally (still within the brackets), any non-white characters after
the second '<,>' are interpreted as the blank value for that column (see
*note Blank pixels::). The blank value can either be in the same type
as the column (for example, ‘-99’ for a signed integer column), or any
string (for example, ‘NaN’ in that same column). In both cases, the
values will be stored in memory as Gnuastro's fixed blank values for
each type. For floating point types, Gnuastro's internal blank value is
IEEE NaN (Not-a-Number). For signed integers, it is the smallest
possible value and for unsigned integers its the largest possible value.
When a formatting problem occurs, or when the column was already
given meta-data in a previous comment, or when the column number is
larger than the actual number of columns in the table (the non-commented
or empty lines), then the comment information line will be ignored.
When a comment information line can be used, the leading and trailing
white space characters will be stripped from all of the elements. For
example, in this line:
# Column 5: column name [km/s, f32,-99] Redshift as speed
The ‘NAME’ field will be '‘column name’' and the ‘TYPE’ field will be
'‘f32’'. Note how all the white space characters before and after
strings are not used, but those in the middle remained. Also, white
space characters are not mandatory. Hence, in the example above, the
‘BLANK’ field will be given the value of '‘-99’'.
Except for the column number (‘N’), the rest of the fields are
optional. Also, the column information comments do not have to be in
order. In other words, the information for column $N+m$ ($m>0$) can be
given in a line before column $N$. Furthermore, you do not have to
specify information for all columns. Those columns that do not have
this information will be interpreted with the default settings (like the
case above: values are double precision floating point, and the column
has no name, unit, or comment). So these lines are all acceptable for
any table (the first one, with nothing but the column number is
redundant):
# Column 5:
# Column 1: ID [,i8] The Clump ID.
# Column 3: mag_f160w [AB mag, f32] Magnitude from the F160W filter
The data type of the column should be specified with one of the
following values:
• For a numeric column, you can use any of the numeric types (and
their recognized identifiers) described in *note Numeric data
types::.
• '‘strN’': for strings. The ‘N’ value identifies the length of the
string (how many characters it has). The start of the string on
each row is the first non-delimiter character of the column that
has the string type. The next ‘N’ characters will be interpreted
as a string and all leading and trailing white space will be
removed.
If the next column's characters, are closer than ‘N’ characters to
the start of the string column in that line/row, they will be
considered part of the string column. If there is a new-line
character before the ending of the space given to the string column
(in other words, the string column is the last column), then
reading of the string will stop, even if the ‘N’ characters are not
complete yet. See ‘tests/table/table.txt’ for one example.
Therefore, the only time you have to pay attention to the
positioning and spaces given to the string column is when it is not
the last column in the table.
The only limitation in this format is that trailing and leading
white space characters will be removed from the columns that are
read. In most cases, this is the desired behavior, but if trailing
and leading white-spaces are critically important to your analysis,
define your own starting and ending characters and remove them
after the table has been read. For example, in the sample table
below, the two '<|>' characters (which are arbitrary) will remain
in the value of the second column and you can remove them manually
later. If only one of the leading or trailing white spaces is
important for your work, you can only use one of the '<|>'s.
# Column 1: ID [label, u8]
# Column 2: Notes [no unit, str50]
1 leading and trailing white space is ignored here 2.3442e10
2 | but they will be preserved here | 8.2964e11
Note that the FITS binary table standard does not define the
‘unsigned int’ and ‘unsigned long’ types, so if you want to convert your
tables to FITS binary tables, use other types. Also, note that in the
FITS ASCII table, there is only one integer type (‘long’). So if you
convert a Gnuastro plain text table to a FITS ASCII table with the *note
Table:: program, the type information for integers will be lost.
Conversely if integer types are important for you, you have to manually
set them when reading a FITS ASCII table (for example, with the Table
program when reading/converting into a file, or with the
‘gnuastro/table.h’ library functions when reading into memory).
File: gnuastro.info, Node: Selecting table columns, Prev: Gnuastro text table format, Up: Tables
4.7.3 Selecting table columns
-----------------------------
At the lowest level, the only defining aspect of a column in a table is
its number, or position. But selecting columns purely by number is not
very convenient and, especially when the tables are large it can be very
frustrating and prone to errors. Hence, table file formats (for
example, see *note Recognized table formats::) have ways to store
additional information about the columns (meta-data). Some of the most
common pieces of information about each column are its _name_, the
_units_ of data in it, and a _comment_ for longer/informal description
of the column's data.
To facilitate research with Gnuastro, you can select columns by
matching, or searching in these three fields, besides the low-level
column number. To view the full list of information on the columns in
the table, you can use the Table program (see *note Table::) with the
command below (replace ‘table-file’ with the filename of your table, if
its FITS, you might also need to specify the HDU/extension which
contains the table):
$ asttable --information table-file
Gnuastro's programs need the columns for different purposes, for
example, in Crop, you specify the columns containing the central
coordinates of the crop centers with the ‘--coordcol’ option (see *note
Crop options::). On the other hand, in MakeProfiles, to specify the
column containing the profile position angles, you must use the ‘--pcol’
option (see *note MakeProfiles catalog::). Thus, there can be no
unified common option name to select columns for all programs (different
columns have different purposes). However, when the program expects a
column for a specific context, the option names end in the ‘col’ suffix
like the examples above. These options accept values in integer (column
number), or string (metadata match/search) format.
If the value can be parsed as a positive integer, it will be seen as
the low-level column number. Note that column counting starts from 1,
so if you ask for column 0, the respective program will abort with an
error. When the value cannot be interpreted as an a integer number, it
will be seen as a string of characters which will be used to
match/search in the table's meta-data. The meta-data field which the
value will be compared with can be selected through the ‘--searchin’
option, see *note Input output options::. ‘--searchin’ can take three
values: ‘name’, ‘unit’, ‘comment’. The matching will be done following
this convention:
• If the value is enclosed in two slashes (for example, ‘-x/RA_/’, or
‘--coordcol=/RA_/’, see *note Crop options::), then it is assumed
to be a regular expression with the same convention as GNU AWK. GNU
AWK has a very well written chapter
(https://www.gnu.org/software/gawk/manual/html_node/Regexp.html)
describing regular expressions, so we will not continue discussing
them here. Regular expressions are a very powerful tool in
matching text and useful in many contexts. We thus strongly
encourage reviewing this chapter for greatly improving the quality
of your work in many cases, not just for searching column meta-data
in Gnuastro.
• When the string is not enclosed between '</>'s, any column that
exactly matches the given value in the given field will be
selected.
Note that in both cases, you can ignore the case of alphabetic
characters with the ‘--ignorecase’ option, see *note Input output
options::. Also, in both cases, multiple columns may be selected with
one call to this function. In this case, the order of the selected
columns (with one call) will be the same order as they appear in the
table.
File: gnuastro.info, Node: Tessellation, Next: Automatic output, Prev: Tables, Up: Common program behavior
4.8 Tessellation
================
It is sometimes necessary to classify the elements in a dataset (for
example, pixels in an image) into a grid of individual, non-overlapping
tiles. For example, when background sky gradients are present in an
image, you can define a tile grid over the image. When the tile sizes
are set properly, the background's variation over each tile will be
negligible, allowing you to measure (and subtract) it. In other cases
(for example, spatial domain convolution in Gnuastro, see *note
Convolve::), it might simply be for speed of processing: each tile can
be processed independently on a separate CPU thread. In the arts and
mathematics, this process is formally known as tessellation
(https://en.wikipedia.org/wiki/Tessellation).
The size of the regular tiles (in units of data-elements, or pixels
in an image) can be defined with the ‘--tilesize’ option. It takes
multiple numbers (separated by a comma) which will be the length along
the respective dimension (in FORTRAN/FITS dimension order). Divisions
are also acceptable, but must result in an integer. For example,
‘--tilesize=30,40’ can be used for an image (a 2D dataset). The regular
tile size along the first FITS axis (horizontal when viewed in SAO DS9)
will be 30 pixels and along the second it will be 40 pixels. Ideally,
‘--tilesize’ should be selected such that all tiles in the image have
exactly the same size. In other words, that the dataset length in each
dimension is divisible by the tile size in that dimension.
However, this is not always possible: the dataset can be any size and
every pixel in it is valuable. In such cases, Gnuastro will look at the
significance of the remainder length, if it is not significant (for
example, one or two pixels), then it will just increase the size of the
first tile in the respective dimension and allow the rest of the tiles
to have the required size. When the remainder is significant (for
example, one pixel less than the size along that dimension), the
remainder will be added to one regular tile's size and the large tile
will be cut in half and put in the two ends of the grid/tessellation.
In this way, all the tiles in the central regions of the dataset will
have the regular tile sizes and the tiles on the edge will be slightly
larger/smaller depending on the remainder significance. The fraction
which defines the remainder significance along all dimensions can be set
through ‘--remainderfrac’.
The best tile size is directly related to the spatial properties of
the property you want to study (for example, gradient on the image). In
practice we assume that the gradient is not present over each tile. So
if there is a strong gradient (for example, in long wavelength ground
based images) or the image is of a crowded area where there is not too
much blank area, you have to choose a smaller tile size. A larger mesh
will give more pixels and so the scatter in the results will be less
(better statistics).
For raw image processing, a single tessellation/grid is not
sufficient. Raw images are the unprocessed outputs of the camera
detectors. Modern detectors usually have multiple readout channels each
with its own amplifier. For example, the Hubble Space Telescope
Advanced Camera for Surveys (ACS) has four amplifiers over its full
detector area dividing the square field of view to four smaller squares.
Ground based image detectors are not exempt, for example, each CCD of
Subaru Telescope's Hyper Suprime-Cam camera (which has 104 CCDs) has
four amplifiers, but they have the same height of the CCD and divide the
width by four parts.
The bias current on each amplifier is different, and initial bias
subtraction is not perfect. So even after subtracting the measured bias
current, you can usually still identify the boundaries of different
amplifiers by eye. See Figure 11(a) in Akhlaghi and Ichikawa (2015) for
an example. This results in the final reduced data to have non-uniform
amplifier-shaped regions with higher or lower background flux values.
Such systematic biases will then propagate to all subsequent
measurements we do on the data (for example, photometry and subsequent
stellar mass and star formation rate measurements in the case of
galaxies).
Therefore an accurate analysis requires a two layer tessellation: the
top layer contains larger tiles, each covering one amplifier channel.
For clarity we will call these larger tiles "channels". The number of
channels along each dimension is defined through the ‘--numchannels’.
Each channel is then covered by its own individual smaller tessellation
(with tile sizes determined by the ‘--tilesize’ option). This will
allow independent analysis of two adjacent pixels from different
channels if necessary. If the image is processed or the detector only
has one amplifier, you can set the number of channels in both dimension
to 1.
The final tessellation can be inspected on the image with the
‘--checktiles’ option that is available to all programs which use
tessellation for localized operations. When this option is called, a
FITS file with a ‘_tiled.fits’ suffix will be created along with the
outputs, see *note Automatic output::. Each pixel in this image has the
number of the tile that covers it. If the number of channels in any
dimension are larger than unity, you will notice that the tile IDs are
defined such that the first channels is covered first, then the second
and so on. For the full list of processing-related common options
(including tessellation options), please see *note Processing options::.
File: gnuastro.info, Node: Automatic output, Next: Output FITS files, Prev: Tessellation, Up: Common program behavior
4.9 Automatic output
====================
All the programs in Gnuastro are designed such that specifying an output
file or directory (based on the program context) is optional. When no
output name is explicitly given (with ‘--output’, see *note Input output
options::), the programs will automatically set an output name based on
the input name(s) and what the program does. For example, when you are
using ConvertType to save FITS image named ‘dataset.fits’ to a JPEG
image and do not specify a name for it, the JPEG output file will be
name ‘dataset.jpg’. When the input is from the standard input (for
example, a pipe, see *note Standard input::), and ‘--output’ is not
given, the output name will be the program's name (for example,
‘converttype.jpg’).
Another very important part of the automatic output generation is
that all the directory information of the input file name is stripped
off of it. This feature can be disabled with the ‘--keepinputdir’
option, see *note Input output options::. It is the default because
astronomical data are usually very large and organized specially with
special file names. In some cases, the user might not have write
permissions in those directories(1).
Let's assume that we are working on a report and want to process the
FITS images from two projects (ABC and DEF), which are stored in the
sub-directories named ‘ABCproject/’ and ‘DEFproject/’ of our top data
directory (‘/mnt/data’). The following shell commands show how one
image from the former is first converted to a JPEG image through
ConvertType and then the objects from an image in the latter project are
detected using NoiseChisel. The text after the ‘#’ sign are comments
(not typed!).
$ pwd # Current location
/home/usrname/research/report
$ ls # List directory contents
ABC01.jpg
$ ls /mnt/data/ABCproject # Archive 1
ABC01.fits ABC02.fits ABC03.fits
$ ls /mnt/data/DEFproject # Archive 2
DEF01.fits DEF02.fits DEF03.fits
$ astconvertt /mnt/data/ABCproject/ABC02.fits --output=jpg # Prog 1
$ ls
ABC01.jpg ABC02.jpg
$ astnoisechisel /mnt/data/DEFproject/DEF01.fits # Prog 2
$ ls
ABC01.jpg ABC02.jpg DEF01_detected.fits
---------- Footnotes ----------
(1) In fact, even if the data is stored on your own computer, it is
advised to only grant write permissions to the super user or root. This
way, you will not accidentally delete or modify your valuable data!
File: gnuastro.info, Node: Output FITS files, Next: Numeric locale, Prev: Automatic output, Up: Common program behavior
4.10 Output FITS files
======================
The output of many of Gnuastro's programs are (or can be) FITS files.
The FITS format has many useful features for storing scientific datasets
(cubes, images and tables) along with a robust features for
archivability. For more on this standard, please see *note Fits::.
As a community convention described in *note Fits::, the first
extension of all FITS files produced by Gnuastro's programs only
contains the meta-data that is intended for the file's extension(s).
For a Gnuastro program, this generic meta-data (that is stored as FITS
keyword records) is its configuration when it produced this dataset:
file name(s) of input(s) and option names, values and comments. You can
use the ‘--outfitsnoconfig’ option to stop the programs from writing
these keywords into the first extension of their output.
When the configuration is too trivial (only input filename, for
example, the program *note Table::) no meta-data is written in this
extension. FITS keywords have the following limitations in regards to
generic option names and values which are described below:
• If a keyword (option name) is longer than 8 characters, the first
word in the record (80 character line) is ‘HIERARCH’ which is
followed by the keyword name.
• Values can be at most 75 characters, but for strings, this changes
to 73 (because of the two extra <'> characters that are necessary).
However, if the value is a file name, containing slash (</>)
characters to separate directories, Gnuastro will break the value
into multiple keywords.
• Keyword names ignore case, therefore they are all in capital
letters. Therefore, if you want to use Grep to inspect these
keywords, use the ‘-i’ option, like the example below.
$ astfits image_detected.fits -h0 | grep -i snquant
The keywords above are classified (separated by an empty line and
title) as a group titled "ProgramName configuration". This meta-data
extension also contains a final group of keywords to keep the basic date
and version information of Gnuastro, its dependencies and the pipeline
that is using Gnuastro (if it is under version control); they are listed
below.
‘DATE’
The creation time of the FITS file. This date is written directly
by CFITSIO and is in UT format.
While the date can be a good metadata in most scenarios, it does
have a caveat: when everything else in your output is the same
between multiple runs, the date will be different! If exact
reproducibility is important for you, this can be annoying! To
stop any Gnuastro program from writing the ‘DATE’ keyword, you can
use the ‘--outfitsnodate’ (see *note Input output options::).
‘DATEUTC’
If the date in the ‘DATE’ keyword is in UTC
(https://en.wikipedia.org/wiki/Coordinated_Universal_Time), this
keyword will have a value of 1; otherwise, it will have a value of
0. If ‘DATE’ is not written, this is also ignored.
‘COMMIT’
Git's commit description from the running directory of Gnuastro's
programs. If the running directory is not version controlled or
‘libgit2’ is not installed (see *note Optional dependencies::) then
this keyword will not be present. The printed value is equivalent
to the output of the following command:
git describe --dirty --always
If the running directory contains non-committed work, then the
stored value will have a '‘-dirty’' suffix. This can be very
helpful to let you know that the data is not ready to be shared
with collaborators or submitted to a journal. You should only
share results that are produced after all your work is committed
(safely stored in the version controlled history and thus
reproducible).
At first sight, version control appears to be mainly a tool for
software developers. However progress in a scientific research is
almost identical to progress in software development: first you
have a rough idea that starts with handful of easy steps. But as
the first results appear to be promising, you will have to extend,
or generalize, it to make it more robust and work in all the
situations your research covers, not just your first test samples.
Slowly you will find wrong assumptions or bad implementations that
need to be fixed ('bugs' in software development parlance).
Finally, when you submit the research to your collaborators or a
journal, many comments and suggestions will come in, and you have
to address them.
Software developers have created version control systems precisely
for this kind of activity. Each significant moment in the
project's history is called a "commit", see *note Version
controlled source::. A snapshot of the project in each "commit" is
safely stored away, so you can revert back to it at a later time,
or check changes/progress. This way, you can be sure that your
work is reproducible and track the progress and history. With
version control, experimentation in the project's analysis is
greatly facilitated, since you can easily revert back if a
brainstorm test procedure fails.
One important feature of version control is that the research
result (FITS image, table, report or paper) can be stamped with the
unique commit information that produced it. This information will
enable you to exactly reproduce that same result later, even if you
have made changes/progress. For one example of a research paper's
reproduction pipeline, please see the reproduction pipeline
(https://gitlab.com/makhlaghi/NoiseChisel-paper) of Akhlaghi and
Ichikawa 2015 (https://arxiv.org/abs/1505.01664) describing *note
NoiseChisel::.
In case you don't want the ‘COMMIT’ keyword in the first extension
of your output FITS file, you can use the ‘--outfitsnocommit’
option (see *note Input output options::).
‘CFITSIO’
The version of CFITSIO used (see *note CFITSIO::). This can be
disabled with ‘--outfitsnoversions’ (see *note Input output
options::).
‘WCSLIB’
The version of WCSLIB used (see *note WCSLIB::). Note that older
versions of WCSLIB do not report the version internally. So this
is only available if you are using more recent WCSLIB versions.
This can be disabled with ‘--outfitsnoversions’ (see *note Input
output options::).
‘GSL’
The version of GNU Scientific Library that was used, see *note GNU
Scientific Library::. This can be disabled with
‘--outfitsnoversions’ (see *note Input output options::).
‘GNUASTRO’
The version of Gnuastro used (see *note Version numbering::). This
can be disabled with ‘--outfitsnoversions’ (see *note Input output
options::).
File: gnuastro.info, Node: Numeric locale, Prev: Output FITS files, Up: Common program behavior
4.11 Numeric locale
===================
If your system locale
(https://en.wikipedia.org/wiki/Locale_(computer_software)) is not
English, it may happen that the '.' is not used as the decimal
separator of basic command-line tools for input or output. For example,
in Spanish and some other languages the decimal separator (symbol used
to separate the integer and fractional part of a number), is a comma.
Therefore in such systems, some programs may print $0.5$ as as '‘0,5’'
(instead of '‘0.5’'). This mainly happens in some core operating system
tools like ‘awk’ or ‘seq’ depend on the locale. This can cause problems
for other programs (like those in Gnuastro that expect a '<.>' as the
decimal separator).
To see the effect, please try the commands below. The first one will
print $0.5$ in your default locale's format. The second set will use
the Spanish locale for printing numbers (which will put a comma between
the 0 and the 5). The third will use the English (US) locale for
printing numbers (which will put a point between the 0 and the 5).
$ seq 0.5 1
$ export LC_NUMERIC=es_ES.utf8
$ seq 0.5 1
$ export LC_NUMERIC=en_US.utf8
$ seq 0.5 1
With the simple command below, you can check your current locale
environment variables for specifying the formats of various things like
date, time, monetary, telephone, numbers, etc. You can change any of
these, by simply giving different values to the respective variable like
above. For a more complete explanation on each variable, see
<https://www.baeldung.com/linux/locale-environment-variables>.
$ locale
To avoid these kinds of locale-specific problems (for example,
another program not being able to read '‘0,5’' as half of unity), you
can change the locale by giving the value of ‘C’ to the ‘LC_NUMERIC’
environment variable (or the lower-level/generic ‘LC_ALL’). You will
notice that ‘C’ is not a human-language and country identifier like
‘en_US’, it is the programming locale, which is well recognized by
programmers in all countries and is available on all Unix-like operating
systems (others may not be pre-defined and may need installation). You
can set the ‘LC_NUMERIC’ only for a single command (the first one below:
simply defining the variable in the same line), or all commands within
the running session (the second command below, or "exporting" it to all
subsequent commands):
## Change the numeric locale, only for this 'seq' command.
$ LC_NUMERIC=C seq 0.5 1
## Change the locale to the standard, for all commands after it.
$ export LC_NUMERIC=C
If you want to change it generally for all future sessions, you can
put the second command in your shell's startup file. For more on
startup files, please see *note Installation directory::.
File: gnuastro.info, Node: Data containers, Next: Data manipulation, Prev: Common program behavior, Up: Top
5 Data containers
*****************
The most low-level and basic property of a dataset is how it is stored.
To process, archive and transmit the data, you need a container to store
it first. From the start of the computer age, different formats have
been defined to store data, optimized for particular applications. One
format/container can never be useful for all applications: the storage
defines the application and vice-versa. In astronomy, the Flexible
Image Transport System (FITS) standard has become the most common format
of data storage and transmission. It has many useful features, for
example, multiple sub-containers (also known as extensions or header
data units, HDUs) within one file, or support for tables as well as
images. Each HDU can store an independent dataset and its corresponding
meta-data. Therefore, Gnuastro has one program (see *note Fits::)
specifically designed to manipulate FITS HDUs and the meta-data (header
keywords) in each HDU.
Your astronomical research does not just involve data analysis (where
the FITS format is very useful). For example, you want to demonstrate
your raw and processed FITS images or spectra as figures within slides,
reports, or papers. The FITS format is not defined for such
applications. Thus, Gnuastro also comes with the ConvertType program
(see *note ConvertType::) which can be used to convert a FITS image to
and from (where possible) other formats like plain text and JPEG (which
allow two way conversion), along with EPS and PDF (which can only be
created from FITS, not the other way round).
Finally, the FITS format is not just for images, it can also store
tables. Binary tables in particular can be very efficient in storing
catalogs that have more than a few tens of columns and rows. However,
unlike images (where all elements/pixels have one data type), tables
contain multiple columns and each column can have different properties:
independent data types (see *note Numeric data types::) and meta-data.
In practice, each column can be viewed as a separate container that is
grouped with others in the table. The only shared property of the
columns in a table is thus the number of elements they contain. To
allow easy inspection/manipulation of table columns, Gnuastro has the
Table program (see *note Table::). It can be used to select certain
table columns in a FITS table and see them as a human readable output on
the command-line, or to save them into another plain text or FITS table.
* Menu:
* Fits:: View and manipulate extensions and keywords.
* ConvertType:: Convert data to various formats.
* Table:: Read and Write FITS tables to plain text.
* Query:: Import data from external databases.
File: gnuastro.info, Node: Fits, Next: ConvertType, Prev: Data containers, Up: Data containers
5.1 Fits
========
The "Flexible Image Transport System", or FITS, is by far the most
common data container format in astronomy and in constant use since the
1970s. Archiving (future usage, simplicity) has been one of the primary
design principles of this format. In the last few decades it has proved
so useful and robust that the Vatican Library has also chosen FITS for
its "long-term digital preservation" project(1).
Although the full name of the standard invokes the idea that it is
only for images, it also contains complete and robust features for
tables. It started off in the 1970s and was formally published as a
standard in 1981, it was adopted by the International Astronomical Union
(IAU) in 1982 and an IAU working group to maintain its future was
defined in 1988. The FITS 2.0 and 3.0 standards were approved in 2000
and 2008 respectively, and the 4.0 draft has also been released
recently, please see the FITS standard document web page
(https://fits.gsfc.nasa.gov/fits_standard.html) for the full text of all
versions. Also see the FITS 3.0 standard paper
(https://doi.org/10.1051/0004-6361/201015362) for a nice introduction
and history along with the full standard.
Many common image formats, for example, a JPEG, only have one
image/dataset per file, however one great advantage of the FITS standard
is that it allows you to keep multiple datasets (images or tables along
with their separate meta-data) in one file. In the FITS standard, each
data + metadata is known as an extension, or more formally a header data
unit or HDU. The HDUs in a file can be completely independent: you can
have multiple images of different dimensions/sizes or tables as separate
extensions in one file. However, while the standard does not impose any
constraints on the relation between the datasets, it is strongly
encouraged to group data that are contextually related with each other
in one file. For example, an image and the table/catalog of objects and
their measured properties in that image. Other examples can be images
of one patch of sky in different colors (filters), or one raw telescope
image along with its calibration data (tables or images).
As discussed above, the extensions in a FITS file can be completely
independent. To keep some information (meta-data) about the group of
extensions in the FITS file, the community has adopted the following
convention: put no data in the first extension, so it is just meta-data.
This extension can thus be used to store Meta-data regarding the whole
file (grouping of extensions). Subsequent extensions may contain data
along with their own separate meta-data. All of Gnuastro's programs
also follow this convention: the main output dataset(s) are placed in
the second (or later) extension(s). The first extension contains no
data the program's configuration (input file name, along with all its
option values) are stored as its meta-data, see *note Output FITS
files::.
The meta-data contain information about the data, for example, which
region of the sky an image corresponds to, the units of the data, what
telescope, camera, and filter the data were taken with, it observation
date, or the software that produced it and its configuration. Without
the meta-data, the raw dataset is practically just a collection of
numbers and really hard to understand, or connect with the real world
(other datasets). It is thus strongly encouraged to supplement your
data (at any level of processing) with as much meta-data about your
processing/science as possible.
The meta-data of a FITS file is in ASCII format, which can be easily
viewed or edited with a text editor or on the command-line. Each
meta-data element (known as a keyword generally) is composed of a name,
value, units and comments (the last two are optional). For example,
below you can see three FITS meta-data keywords for specifying the world
coordinate system (WCS, or its location in the sky) of a dataset:
LATPOLE = -27.805089 / [deg] Native latitude of celestial pole
RADESYS = 'FK5' / Equatorial coordinate system
EQUINOX = 2000.0 / [yr] Equinox of equatorial coordinates
However, there are some limitations which discourage viewing/editing
the keywords with text editors. For example, there is a fixed length of
80 characters for each keyword (its name, value, units and comments) and
there are no new-line characters, so on a text editor all the keywords
are seen in one line. Also, the meta-data keywords are immediately
followed by the data which are commonly in binary format and will show
up as strange looking characters on a text editor, and significantly
slowing down the processor.
Gnuastro's Fits program was designed to allow easy manipulation of
FITS extensions and meta-data keywords on the command-line while
conforming fully with the FITS standard. For example, you can copy or
cut (copy and remove) HDUs/extensions from one FITS file to another, or
completely delete them. It also has features to delete, add, or edit
meta-data keywords within one HDU.
* Menu:
* Invoking astfits:: Arguments and options to Header.
---------- Footnotes ----------
(1) <https://www.vaticanlibrary.va/home.php?pag=progettodigit>
File: gnuastro.info, Node: Invoking astfits, Prev: Fits, Up: Fits
5.1.1 Invoking Fits
-------------------
Fits can print or manipulate the FITS file HDUs (extensions), meta-data
keywords in a given HDU. The executable name is ‘astfits’ with the
following general template
$ astfits [OPTION...] ASTRdata
One line examples:
## View general information about every extension:
$ astfits image.fits
## Print the header keywords in the second HDU (counting from 0):
$ astfits image.fits -h1
## Only print header keywords that contain `NAXIS':
$ astfits image.fits -h1 | grep NAXIS
## Only print the WCS standard PC matrix elements
$ astfits image.fits -h1 | grep 'PC._.'
## Copy a HDU from input.fits to out.fits:
$ astfits input.fits --copy=hdu-name --output=out.fits
## Update the OLDKEY keyword value to 153.034:
$ astfits --update=OLDKEY,153.034,"Old keyword comment"
## Delete one COMMENT keyword and add a new one:
$ astfits --delete=COMMENT --comment="Anything you like ;-)."
## Write two new keywords with different values and comments:
$ astfits --write=MYKEY1,20.00,"An example keyword" --write=MYKEY2,fd
## Inspect individual pixel area taken based on its WCS (in degree^2).
## Then convert the area to arcsec^2 with the Arithmetic program.
$ astfits input.fits --pixelareaonwcs -o pixarea.fits
$ astarithmetic pixarea.fits 3600 3600 x x -o pixarea_arcsec2.fits
When no action is requested (and only a file name is given), Fits
will print a list of information about the extension(s) in the file.
This information includes the HDU number, HDU name (‘EXTNAME’ keyword),
type of data (see *note Numeric data types::, and the number of data
elements it contains (size along each dimension for images and table
rows and columns). Optionally, a comment column is printed for special
situations (like a 2D HEALPix grid that is usually stored as a 1D
dataset/table). You can use this to get a general idea of the contents
of the FITS file and what HDU to use for further processing, either with
the Fits program or any other Gnuastro program.
Here is one example of information about a FITS file with four
extensions: the first extension has no data, it is a purely meta-data
HDU (commonly used to keep meta-data about the whole file, or grouping
of extensions, see *note Fits::). The second extension is an image with
name ‘IMAGE’ and single precision floating point type (‘float32’, see
*note Numeric data types::), it has 4287 pixels along its first
(horizontal) axis and 4286 pixels along its second (vertical) axis. The
third extension is also an image with name ‘MASK’. It is in 2-byte
integer format (‘int16’) which is commonly used to keep information
about pixels (for example, to identify which ones were saturated, or
which ones had cosmic rays and so on), note how it has the same size as
the ‘IMAGE’ extension. The third extension is a binary table called
‘CATALOG’ which has 12371 rows and 5 columns (it probably contains
information about the sources in the image).
GNU Astronomy Utilities X.X
Run on Day Month DD HH:MM:SS YYYY
-----
HDU (extension) information: `image.fits'.
Column 1: Index (counting from 0).
Column 2: Name (`EXTNAME' in FITS standard).
Column 3: Image data type or `table' format (ASCII or binary).
Column 4: Size of data in HDU.
-----
0 n/a uint8 0
1 IMAGE float32 4287x4286
2 MASK int16 4287x4286
3 CATALOG table_binary 12371x5
If a specific HDU is identified on the command-line with the ‘--hdu’
(or ‘-h’ option) and no operation requested, then the full list of
header keywords in that HDU will be printed (as if the ‘--printallkeys’
was called, see below). It is important to remember that this only
occurs when ‘--hdu’ is given on the command-line. The ‘--hdu’ value
given in a configuration file will only be used when a specific
operation on keywords requested. Therefore as described in the
paragraphs above, when no explicit call to the ‘--hdu’ option is made on
the command-line and no operation is requested (on the command-line or
configuration files), the basic information of each HDU/extension is
printed.
The operating mode and input/output options to Fits are similar to
the other programs and fully described in *note Common options::. The
options particular to Fits can be divided into three groups: 1) those
related to modifying HDUs or extensions (see *note HDU information and
manipulation::), and 2) those related to viewing/modifying meta-data
keywords (see *note Keyword inspection and manipulation::). 3) those
related to creating meta-images where each pixel shows values for a
specific property of the image (see *note Pixel information images::).
These three classes of options cannot be called together in one run: you
can either work on the extensions, meta-data keywords in any instance of
Fits, or create meta-images where each pixel shows a particular
information about the image itself.
* Menu:
* HDU information and manipulation:: Learn about the HDUs and move them.
* Keyword inspection and manipulation:: Manipulate metadata keywords in a HDU.
* Pixel information images:: Pixel values contain information on the pixels.
File: gnuastro.info, Node: HDU information and manipulation, Next: Keyword inspection and manipulation, Prev: Invoking astfits, Up: Invoking astfits
5.1.1.1 HDU information and manipulation
........................................
Each FITS file header data unit, or HDU (also known as an extension) is
an independent dataset (data + meta-data). Multiple HDUs can be stored
in one FITS file, see *note Fits::. The general HDU-related options to
the Fits program are listed below as two general classes: the first
group below focus on HDU information while the latter focus on
manipulating (moving or deleting) the HDUs.
The options below print information about the given HDU on the
command-line. Thus they cannot be called together in one command (each
has its own independent output).
‘-n’
‘--numhdus’
Print the number of extensions/HDUs in the given file. Note that
this option must be called alone and will only print a single
number. It is thus useful in scripts, for example, when you need
to do check the number of extensions in a FITS file.
For a complete list of basic meta-data on the extensions in a FITS
file, do not use any of the options in this section or in *note
Keyword inspection and manipulation::. For more, see *note
Invoking astfits::.
‘--hastablehdu’
Print ‘1’ (on standard output) if at least one table HDU (ASCII or
binary) exists in the FITS file. Otherwise (when no table HDU
exists in the file), print ‘0’.
‘--listtablehdus’
Print the names or numbers (when a name does not exist, counting
from zero) of HDUs that contain a table (ASCII or Binary) on
standard output, one per line. Otherwise (when no table HDU exists
in the file) nothing will be printed.
‘--hasimagehdu’
Print ‘1’ (on standard output) if at least one image HDU exists in
the FITS file. Otherwise (when no image HDU exists in the file),
print ‘0’.
In the FITS standard, any array with any dimensions is called an
"image", therefore this option includes 1, 3 and 4 dimensional
arrays too. However, an image HDU with zero dimensions (which is
usually the first extension and only contains metadata) is not
counted here.
‘--listimagehdus’
Print the names or numbers (when a name does not exist, counting
from zero) of HDUs that contain an image on standard output, one
per line. Otherwise (when no image HDU exists in the file) nothing
will be printed.
In the FITS standard, any array with any dimensions is called an
"image", therefore this option includes 1, 3 and 4 dimensional
arrays too. However, an image HDU with zero dimensions (which is
usually the first extension and only contains metadata) is not
counted here.
‘--listallhdus’
Print the names or numbers (when a name does not exist, counting
from zero) of all HDUs within the input file on the standard
output, one per line.
‘--pixelscale’
Print the HDU's pixel-scale (change in world coordinate for one
pixel along each dimension) and pixel area or voxel volume.
Without the ‘--quiet’ option, the output of ‘--pixelscale’ has
multiple lines and explanations, thus being more human-friendly.
It prints the file/HDU name, number of dimensions, and the units
along with the actual pixel scales. Also, when any of the units
are in degrees, the pixel scales and area/volume are also printed
in units of arc-seconds. For 3D datasets, the pixel area (on each
2D slice of the 3D cube) is printed as well as the voxel volume.
If you only want the pixel area of a 2D image in units of
arcsec$^2$ you can use ‘--pixelareaarcsec2’ described below.
However, in scripts (that are to be run automatically), this
human-friendly format is annoying, so when called with the
‘--quiet’ option, only the pixel-scale value(s) along each
dimension is(are) printed in one line. These numbers are followed
by the pixel area (in the raw WCS units). For 3D datasets, this
will be area on each 2D slice. Finally, for 3D datasets, a final
number (the voxel volume) is printed. As a summary, in ‘--quiet’
mode, for 2D datasets three numbers are printed and for 3D
datasets, 5 numbers are printed. If the dataset has more than 3
dimensions, only the pixel-scale values are printed (no area or
volume will be printed).
‘--pixelareaarcsec2’
Print the HDU's pixel area in units of arcsec$^2$. This option
only works on 2D images, that have WCS coordinates in units of
degrees. For lower-level information about the pixel scale in each
dimension, see ‘--pixelscale’ (described above).
‘--skycoverage’
Print the rectangular area (or 3D cube) covered by the given
image/datacube HDU over the Sky in the WCS units. The covered area
is reported in two ways: 1) the center and full width in each
dimension, 2) the minimum and maximum sky coordinates in each
dimension. This is option is thus useful when you want to get a
general feeling of a new image/dataset, or prepare the inputs to
query external databases in the region of the image (for example,
with *note Query::).
If run without the ‘--quiet’ option, the values are given with a
human-friendly description. For example, here is the output of
this option on an image taken near the star Castor:
$ astfits castor.fits --skycoverage
Input file: castor.fits (hdu: 1)
Sky coverage by center and (full) width:
Center: 113.9149075 31.93759664
Width: 2.41762045 2.67945253
Sky coverage by range along dimensions:
RA 112.7235592 115.1411797
DEC 30.59262123 33.27207376
With the ‘--quiet’ option, the values are more machine-friendly
(easy to parse). It has two lines, where the first line contains
the center/width values and the second line shows the coordinate
ranges in each dimension.
$ astfits castor.fits --skycoverage --quiet
113.9149075 31.93759664 2.41762045 2.67945253
112.7235592 115.1411797 30.59262123 33.27207376
Note that this is a simple rectangle (cube in 3D) definition, so if
the image is rotated in relation to the celestial coordinates a
general polygon is necessary to exactly describe the coverage.
Hence when there is rotation, the reported area will be larger than
the actual area containing data, you can visually see the area with
the ‘--pixelareaonwcs’ option of *note Fits::.
Currently this option only supports images that are less than 180
degrees in width (which is usually the case!). This requirement
has been necessary to account for images that cross the RA=0 hour
circle on the sky. Please get in touch with us at
<mailto:bug-gnuastro@gnu.org> if you have an image that is larger
than 180 degrees so we try to find a solution based on need.
‘--datasum’
Calculate and print the given HDU's "datasum" to stdout. The given
HDU is specified with the ‘--hdu’ (or ‘-h’) option. This number is
calculated by parsing all the bytes of the given HDU's data records
(excluding keywords). This option ignores any possibly existing
‘DATASUM’ keyword in the HDU. For more on ‘DATASUM’ in the FITS
standard, see *note Keyword inspection and manipulation:: (under
the ‘checksum’ component of ‘--write’).
You can use this option to confirm that the data in two different
HDUs (possibly with different keywords) is identical. Its
advantage over ‘--write=datasum’ (which writes the ‘DATASUM’
keyword into the given HDU) is that it does not require write
permissions.
‘--datasum-encoded’
Similar to ‘--datasum’, except that the output will be an encoded
string of numbers and small-caps alphabetic characters. This is
the same encoding algorithm that is used for the ‘CHECKSUM’
keyword, but applied to the value of the ‘DATASUM’ result. In some
scenarios, this string can be more useful than the raw integer.
The following options manipulate (move/delete) the HDUs in one FITS
file or to another FITS file. These options may be called multiple
times in one run. If so, the extensions will be copied from the input
FITS file to the output FITS file in the given order (on the
command-line and also in configuration files, see *note Configuration
file precedence::). If the separate classes are called together in one
run of Fits, then first ‘--copy’ is run (on all specified HDUs),
followed by ‘--cut’ (again on all specified HDUs), and then ‘--remove’
(on all specified HDUs).
The ‘--copy’ and ‘--cut’ options need an output FITS file (specified
with the ‘--output’ option). If the output file exists, then the
specified HDU will be copied following the last extension of the output
file (the existing HDUs in it will be untouched). Thus, after Fits
finishes, the copied HDU will be the last HDU of the output file. If no
output file name is given, then automatic output will be used to store
the HDUs given to this option (see *note Automatic output::).
‘-C STR’
‘--copy=STR’
Copy the specified extension into the output file, see explanations
above.
‘-k STR’
‘--cut=STR’
Cut (copy to output, remove from input) the specified extension
into the output file, see explanations above.
‘-R STR’
‘--remove=STR’
Remove the specified HDU from the input file.
The first (zero-th) HDU cannot be removed with this option.
Consider using ‘--copy’ or ‘--cut’ in combination with
‘primaryimghdu’ to not have an empty zero-th HDU. From CFITSIO: "In
the case of deleting the primary array (the first HDU in the file)
then [it] will be replaced by a null primary array containing the
minimum set of required keywords and no data.". So in practice,
any existing data (array) and meta-data in the first extension will
be removed, but the number of extensions in the file will not
change. This is because of the unique position the first FITS
extension has in the FITS standard (for example, it cannot be used
to store tables).
‘--primaryimghdu’
Copy or cut an image HDU to the zero-th HDU/extension a file that
does not yet exist. This option is thus irrelevant if the output
file already exists or the copied/cut extension is a FITS table.
For example, with the commands below, first we make sure that
‘out.fits’ does not exist, then we copy the first extension of
‘in.fits’ to the zero-th extension of ‘out.fits’.
$ rm -f out.fits
$ astfits in.fits --copy=1 --primaryimghdu --output=out.fits
If we had not used ‘--primaryimghdu’, then the zero-th extension of
‘out.fits’ would have no data, and its second extension would host
the copied image (just like any other output of Gnuastro).
File: gnuastro.info, Node: Keyword inspection and manipulation, Next: Pixel information images, Prev: HDU information and manipulation, Up: Invoking astfits
5.1.1.2 Keyword inspection and manipulation
...........................................
The meta-data in each header data unit, or HDU (also known as extension,
see *note Fits::) is stored as "keyword"s. Each keyword consists of a
name, value, unit, and comments. The Fits program (see *note Fits::)
options related to viewing and manipulating keywords in a FITS HDU are
described below.
First, let's review the ‘--keyvalue’ option which should be called
separately from the rest of the options described in this section.
Also, unlike the rest of the options in this section, with ‘--keyvalue’,
you can give more than one input file.
‘-l STR[,STR[,...]’
‘--keyvalue=STR[,STR[,...]’
Only print the value of the requested keyword(s): the ‘STR’s.
‘--keyvalue’ can be called multiple times, and each call can
contain multiple comma-separated keywords. If more than one file
is given, this option uses the same HDU/extension for all of them
(value to ‘--hdu’). For example, you can get the number of
dimensions of the three FITS files in the running directory, as
well as the length along each dimension, with this command:
$ astfits *.fits --keyvalue=NAXIS,NAXIS1 --keyvalue=NAXIS2
image-a.fits 2 774 672
image-b.fits 2 774 672
image-c.fits 2 387 336
If only one input is given, and the ‘--quiet’ option is activated,
the file name is not printed on the first column, only the values
of the requested keywords.
$ astfits image-a.fits --keyvalue=NAXIS,NAXIS1 \
--keyvalue=NAXIS2 --quiet
2 774 672
*Argument list too long:* if the list of input files are too long,
the shell is going to complain with the ‘Argument list too long’
error! To avoid this problem, you can put the list of files in a
plain-text file and give that plain-text file to the Fits program
through the ‘--arguments’ option discussed below.
The output is internally stored (and finally printed) as a table
(with one column per keyword). Therefore just like the Table
program, you can use ‘--colinfoinstdout’ to print the metadata like
the example below (also see *note Invoking asttable::). The
keyword metadata (comments and units) are extracted from the
comments and units of the keyword in the input files (first file
that has a comment or unit). Hence if the keyword does not have
units or comments in any of the input files, they will be empty.
For more on Gnuastro's plain-text metadata format, see *note
Gnuastro text table format::.
$ astfits *.fits --keyvalue=NAXIS,NAXIS1,NAXIS2 \
--colinfoinstdout
# Column 1: FILENAME [name,str10,] Name of input file.
# Column 2: NAXIS [ ,u8 ,] number of data axes
# Column 3: NAXIS1 [ ,u16 ,] length of data axis 1
# Column 4: NAXIS2 [ ,u16 ,] length of data axis 2
image-a.fits 2 774 672
image-b.fits 2 774 672
image-c.fits 2 387 336
Another advantage of a table output is that you can directly write
the table to a file. For example, if you add
‘--output=fileinfo.fits’, the information above will be printed
into a FITS table. You can also pipe it into *note Table:: to
select files based on certain properties, to sort them based on
another property, or any other operation that can be done with
Table (including *note Column arithmetic::). For example, with the
command below, you can select all the files that have a size larger
than 500 pixels in both dimensions.
$ astfits *.fits --keyvalue=NAXIS,NAXIS1,NAXIS2 \
--colinfoinstdout \
| asttable --range=NAXIS1,500,inf \
--range=NAXIS2,500,inf -cFILENAME
image-a.fits
image-b.fits
Note that ‘--colinfoinstdout’ is necessary to use column names when
piping to other programs (like ‘asttable’ above). Also, with the
‘-cFILENAME’ option, we are asking Table to only print the final
file names (we do not need the sizes any more).
The commands with multiple files above used ‘*.fits’, which is only
useful when all your FITS files are in the same directory.
However, in many cases, your FITS files will be scattered in
multiple sub-directories of a certain top-level directory, or you
may only want those with more particular file name patterns. A
more powerful way to list the input files to ‘--keyvalue’ is to use
the ‘find’ program in Unix-like operating systems. For example,
with the command below you can search all the FITS files in all the
sub-directories of ‘/TOP/DIR’.
astfits $(find /TOP/DIR/ -name "*.fits") --keyvalue=NAXIS2
‘--arguments=STR’
A plain-text file containing the list of input files that will be
used in ‘--keyvalue’. Each word (group of characters separated by
SPACE or new-line) is assumed to be the name of the separate input
file. This option is only relevant when no input files are given
as arguments on the command-line: if any arguments are given, this
option is ignored.
This is necessary when the list of input files are very long;
causing the shell to abort with an ‘Argument list too long’ error.
In such cases, you can put the list into a plain-text file and use
this option like below:
$ ls $(path)/*.fits > list.txt
$ astfits --arguments=list.txt --keyvalue=NAXIS1
‘-O’
‘--colinfoinstdout’
Print column information (or metadata) above the column values when
writing keyword values to standard output with ‘--keyvalue’. You
can read this option as column-information-in-standard-output.
Below we will discuss the options that can be used to manipulate
keywords. To see the full list of keywords in a FITS HDU, you can use
the ‘--printallkeys’ option. If any of the keyword modification options
below are requested (for example, ‘--update’), the headers of the input
file/HDU will be changed first, then printed. Keyword modification is
done within the input file. Therefore, if you want to keep the original
FITS file or HDU intact, it is easiest to create a copy of the file/HDU
first and then run Fits on that (for copying a HDU to another file, see
*note HDU information and manipulation::. In the FITS standard,
keywords are always uppercase. So case does not matter in the input or
output keyword names you specify.
*‘CHECKSUM’ automatically updated, when present:* the keyword
modification options will change the contents of the HDU. Therefore, if
a ‘CHECKSUM’ is present in the HDU, after all the keyword modification
options have been complete, Fits will also update ‘CHECKSUM’ before
closing the file.
Most of the options can accept multiple instances in one command.
For example, you can add multiple keywords to delete by calling
‘--delete’ multiple times, since repeated keywords are allowed, you can
even delete the same keyword multiple times. The action of such options
will start from the top most keyword.
The precedence of operations are described below. Note that while
the order within each class of actions is preserved, the order of
individual actions is not. So irrespective of what order you called
‘--delete’ and ‘--update’. First, all the delete operations are going
to take effect then the update operations.
1. ‘--delete’
2. ‘--rename’
3. ‘--update’
4. ‘--write’
5. ‘--asis’
6. ‘--history’
7. ‘--comment’
8. ‘--date’
9. ‘--printallkeys’
10. ‘--verify’
11. ‘--copykeys’
All possible syntax errors will be reported before the keywords are
actually written. FITS errors during any of these actions will be
reported, but Fits will not stop until all the operations are complete.
If ‘--quitonerror’ is called, then Fits will immediately stop upon the
first error.
If you want to inspect only a certain set of header keywords, it is
easiest to pipe the output of the Fits program to GNU Grep. Grep is a
very powerful and advanced tool to search strings which is precisely
made for such situations. for example, if you only want to check the
size of an image FITS HDU, you can run:
$ astfits input.fits | grep NAXIS
*FITS STANDARD KEYWORDS:* Some header keywords are necessary for later
operations on a FITS file, for example, BITPIX or NAXIS, see the FITS
standard for their full list. If you modify (for example, remove or
rename) such keywords, the FITS file extension might not be usable any
more. Also be careful for the world coordinate system keywords, if you
modify or change their values, any future world coordinate system (like
RA and Dec) measurements on the image will also change.
The keyword related options to the Fits program are fully described
below.
‘-d STR’
‘--delete=STR’
Delete one instance of the ‘STR’ keyword from the FITS header.
Multiple instances of ‘--delete’ can be given (possibly even for
the same keyword, when its repeated in the meta-data). All
keywords given will be removed from the headers in the same given
order. If the keyword does not exist, Fits will give a warning and
return with a non-zero value, but will not stop. To stop as soon
as an error occurs, run with ‘--quitonerror’.
‘-r STR,STR’
‘--rename=STR,STR’
Rename a keyword to a new value (for example,
‘--rename=OLDNAME,NEWNAME’. ‘STR’ contains both the existing and
new names, which should be separated by either a comma (<,>) or a
space character. Note that if you use a space character, you have
to put the value to this option within double quotation marks (<">)
so the space character is not interpreted as an option separator.
Multiple instances of ‘--rename’ can be given in one command. The
keywords will be renamed in the specified order. If the keyword
does not exist, Fits will give a warning and return with a non-zero
value, but will not stop. To stop as soon as an error occurs, run
with ‘--quitonerror’.
‘-u STR’
‘--update=STR’
Update a keyword, its value, its comments and its units in the
format described below. If there are multiple instances of the
keyword in the header, they will be changed from top to bottom
(with multiple ‘--update’ options).
The format of the values to this option can best be specified with
an example:
--update=KEYWORD,value,"comments for this keyword",unit
If there is a writing error, Fits will give a warning and return
with a non-zero value, but will not stop. To stop as soon as an
error occurs, run with ‘--quitonerror’.
The value can be any numerical or string value(1). Other than the
‘KEYWORD’, all the other values are optional. To leave a given
token empty, follow the preceding comma (<,>) immediately with the
next. If any space character is present around the commas, it will
be considered part of the respective token. So if more than one
token has space characters within it, the safest method to specify
a value to this option is to put double quotation marks around each
individual token that needs it. Note that without double quotation
marks, space characters will be seen as option separators and can
lead to undefined behavior.
‘-w STR’
‘--write=STR’
Write a keyword to the header. For the possible value input
formats, comments and units for the keyword, see the ‘--update’
option above. The special names (first string) below will cause a
special behavior:
‘/’
Write a "title" to the list of keywords. A title consists of
one blank line and another which is blank for several spaces
and starts with a slash (</>). The second string given to
this option is the "title" or string printed after the slash.
For example, with the command below you can add a "title" of
'My keywords' after the existing keywords and add the
subsequent ‘K1’ and ‘K2’ keywords under it (note that keyword
names are not case sensitive).
$ astfits test.fits -h1 --write=/,"My keywords" \
--write=k1,1.23,"My first keyword" \
--write=k2,4.56,"My second keyword"
$ astfits test.fits -h1
[[[ ... truncated ... ]]]
/ My keywords
K1 = 1.23 / My first keyword
K2 = 4.56 / My second keyword
END
Adding a "title" before each contextually separate group of
header keywords greatly helps in readability and visual
inspection of the keywords. So generally, when you want to
add new FITS keywords, it is good practice to also add a title
before them.
The reason you need to use </> as the keyword name for setting
a title is that </> is the first non-white character.
The title(s) is(are) written into the FITS with the same order
that ‘--write’ is called. Therefore in one run of the Fits
program, you can specify many different titles (with their own
keywords under them). For example, the command below that
builds on the previous example and adds another group of
keywords named ‘A1’ and ‘A2’.
$ astfits test.fits -h1 --write=/,"My keywords" \
--write=k1,1.23,"My first keyword" \
--write=k2,4.56,"My second keyword" \
--write=/,"My second group of keywords" \
--write=a1,7.89,"First keyword" \
--write=a2,0.12,"Second keyword"
‘checksum’
When nothing is given afterwards, the header integrity
keywords ‘DATASUM’ and ‘CHECKSUM’ will be calculated and
written/updated. The calculation and writing is done fully by
CFITSIO, therefore they comply with the FITS standard 4.0(2)
that defines these keywords (its Appendix J).
If a value is given (e.g., ‘--write=checksum,MyOwnCheckSum’),
then CFITSIO will not be called to calculate these two
keywords and the value (as well as possible comment and unit)
will be written just like any other keyword. This is
generally not recommended since ‘CHECKSUM’ is a reserved FITS
standard keyword. If you want to calculate the checksum with
another hashing standard manually and write it into the
header, it is recommended to use another keyword name.
In the FITS standard, ‘CHECKSUM’ depends on the HDU's data
_and_ header keywords, it will therefore not be valid if you
make any further changes to the header after writing the
‘CHECKSUM’ keyword. This includes any further keyword
modification options in the same call to the Fits program.
However, ‘DATASUM’ only depends on the data section of the
HDU/extension, so it is not changed when you add, remove or
update the header keywords. Therefore, it is recommended to
write these keywords as the last keywords that are
written/modified in the extension. You can use the ‘--verify’
option (described below) to verify the values of these two
keywords.
‘datasum’
Similar to ‘checksum’, but only write the ‘DATASUM’ keyword
(that does not depend on the header keywords, only the data).
‘-a STR’
‘--asis=STR’
Write the given ‘STR’ _exactly_ as it is, into the given FITS file
header with no modifications. If the contents of ‘STR’ does not
conform to the FITS standard for keywords, then it may (most
probably: it will!) corrupt your file and you may not be able to
open it any more. So please be *very careful* with this option
(its your responsibility to make sure that the string conforms with
the FITS standard for keywords).
If you want to define the keyword from scratch, it is best to use
the ‘--write’ option (see below) and let CFITSIO worry about
complying with the FITS standard. Also, you want to copy keywords
from one FITS file to another, you can use ‘--copykeys’ that is
described below. Through these high-level instances, you don't
have to worry about low-level issues.
One common usage of ‘--asis’ occurs when you are given the contents
of a FITS header (many keywords) as a plain-text file (so the
format of each keyword line conforms with the FITS standard, just
the file is plain-text, and you have one keyword per line when you
open it in a plain-text editor). In that case, Gnuastro's Fits
program won't be able to parse it (it doesn't conform to the FITS
standard, which doesn't have a new-line character!). With the
command below, you can insert those headers in ‘headers.txt’ into
‘img.fits’ (its HDU number 1, the default; you can change the HDU
to modify with ‘--hdu’).
$ cat headers.txt \
| while read line; do \
astfits img.fits --asis="$line"; \
done
*Don't forget a title:* Since the newly added headers in the
example above weren't originally in the file, they are probably
some form of high-level metadata. The raw example above will just
append the new keywords after the last one. Making it hard for
human readability (its not clear what this new group of keywords
signify, where they start, and where this group of keywords end).
To help the human readability of the header, add a title for this
group of keywords before writing them. To do that, run the
following command before the ‘cat ...’ command above (replace
‘Imported keys’ with any title that best describes this group of
new keywords based on their context):
$ astfits img.fits --write=/,"Imported keys"
‘-H STR’
‘--history STR’
Add a ‘HISTORY’ keyword to the header with the given value. A new
‘HISTORY’ keyword will be created for every instance of this
option. If the string given to this option is longer than 70
characters, it will be separated into multiple keyword cards. If
there is an error, Fits will give a warning and return with a
non-zero value, but will not stop. To stop as soon as an error
occurs, run with ‘--quitonerror’.
‘-c STR’
‘--comment STR’
Add a ‘COMMENT’ keyword to the header with the given value.
Similar to the explanation for ‘--history’ above.
‘-t’
‘--date’
Put the current date and time in the header. If the ‘DATE’ keyword
already exists in the header, it will be updated. If there is a
writing error, Fits will give a warning and return with a non-zero
value, but will not stop. To stop as soon as an error occurs, run
with ‘--quitonerror’.
‘-p’
‘--printallkeys’
Print the full metadata (keywords, values, units and comments) in
the specified FITS extension (HDU). If this option is called along
with any of the other keyword editing commands, as described above,
all other editing commands take precedence to this. Therefore, it
will print the final keywords after all the editing has been done.
‘--printkeynames’
Print only the keyword names of the specified FITS extension (HDU),
one line per name. This option must be called alone.
‘-v’
‘--verify’
Verify the ‘DATASUM’ and ‘CHECKSUM’ data integrity keywords of the
FITS standard. See the description under the ‘checksum’ (under
‘--write’, above) for more on these keywords.
This option will print ‘Verified’ for both keywords if they can be
verified. Otherwise, if they do not exist in the given
HDU/extension, it will print ‘NOT-PRESENT’, and if they cannot be
verified it will print ‘INCORRECT’. In the latter case (when the
keyword values exist but cannot be verified), the Fits program will
also return with a failure.
By default this function will also print a short description of the
‘DATASUM’ AND ‘CHECKSUM’ keywords. You can suppress this extra
information with ‘--quiet’ option.
‘--copykeys=INT:INT/STR,STR[,STR]’
Copy the desired set of the input's keyword records, to the to the
output (specified with the ‘--output’ and ‘--outhdu’ for the
filename and HDU/extension respectively). The keywords to copy can
be given either as a range (in the format of ‘INT:INT’, inclusive)
or a list of keyword names as comma-separated strings (‘STR,STR’),
the list can have any number of keyword names. More details and
examples of the two forms are given below:
Range
The given string to this option must be two integers separated
by a colon (<:>). The first integer must be positive
(counting of the keyword records starts from 1). The second
integer may be negative (zero is not acceptable) or an integer
larger than the first.
A negative second integer means counting from the end. So
‘-1’ is the last copy-able keyword (not including the ‘END’
keyword).
To see the header keywords of the input with a number before
them, you can pipe the output of the Fits program (when it
prints all the keywords in an extension) into the ‘cat’
program like below:
$ astfits input.fits -h1 | cat -n
List of names
The given string to this option must be a comma separated list
of keyword names. For example, see the command below:
$ astfits input.fits -h1 --copykeys=KEY1,KEY2 \
--output=output.fits --outhdu=1
Please consider the notes below when copying keywords with
names:
• If the number of characters in the name is more than 8,
CFITSIO will place a ‘HIERARCH’ before it. In this case
simply give the name and do not give the ‘HIERARCH’
(which is a constant and not considered part of the
keyword name).
• If your keyword name is composed only of digits, do not
give it as the first name given to ‘--copykeys’.
Otherwise, it will be confused with the range format
above. You can safely give an only-digit keyword name as
the second, or third requested keywords.
• If the keyword is repeated more than once in the header,
currently only the first instance will be copied. In
other words, even if you call ‘--copykeys’ multiple times
with the same keyword name, its first instance will be
copied. If you need to copy multiple instances of the
same keyword, please get in touch with us at
‘bug-gnuastro@gnu.org’.
‘--outhdu’
The HDU/extension to write the output keywords of ‘--copykeys’.
‘-Q’
‘--quitonerror’
Quit if any of the operations above are not successful. By default
if an error occurs, Fits will warn the user of the faulty keyword
and continue with the rest of actions.
‘-s STR’
‘--datetosec STR’
Interpret the value of the given keyword in the FITS date format
(most generally: ‘YYYY-MM-DDThh:mm:ss.ddd...’) and return the
corresponding Unix epoch time (number of seconds that have passed
since 00:00:00 Thursday, January 1st, 1970). The
‘Thh:mm:ss.ddd...’ section (specifying the time of day), and also
the ‘.ddd...’ (specifying the fraction of a second) are optional.
The value to this option must be the FITS keyword name that
contains the requested date, for example, ‘--datetosec=DATE-OBS’.
This option can also interpret the older FITS date format
(‘DD/MM/YYThh:mm:ss.ddd...’) where only two characters are given to
the year. In this case (following the GNU C Library), this option
will make the following assumption: values 68 to 99 correspond to
the years 1969 to 1999, and values 0 to 68 as the years 2000 to
2068.
This is a very useful option for operations on the FITS date
values, for example, sorting FITS files by their dates, or finding
the time difference between two FITS files. The advantage of
working with the Unix epoch time is that you do not have to worry
about calendar details (for example, the number of days in
different months, or leap years).
‘--wcscoordsys=STR’
Convert the coordinate system of the image's world coordinate
system (WCS) to the given coordinate system (‘STR’) and write it
into the file given to ‘--output’ (or an automatically named file
if no ‘--output’ has been given).
For example, with the command below, ‘img-eq.fits’ will have an
identical dataset (pixel values) as ‘image.fits’. However, the WCS
coordinate system of ‘img-eq.fits’ will be the equatorial
coordinate system in the Julian calendar epoch 2000 (which is the
most common epoch used today). Fits will automatically extract the
current coordinate system of ‘image.fits’ and as long as it is one
of the recognized coordinate systems listed below, it will do the
conversion.
$ astfits image.fits --wcscoordsys=eq-j2000 \
--output=img-eq.fits
The currently recognized coordinate systems are listed below (the
most common one today is ‘eq-j2000’):
‘eq-j2000’
2000.0 (Julian-year) equatorial coordinates. This is also
known as FK5 (short for "Fundamental Katalog No 5" which was
the source of the star coordinates used to define it).
This coordinate system is based on the motion of the Sun and
has epochs when the mean equator was used (for example
‘eq-b1950’ below). Furthermore, the definition of year is
different: either the Besselian year in 1950.0, or the Julian
year in 2000. For more on their difference and links for
further reading about epochs in astronomy, please see the
description in Wikipedia
(https://en.wikipedia.org/wiki/Epoch_(astronomy)).
Because of these difficulties, the equatorial J2000.0
coordinate system has been deprecated by the IAU in favor of
International Celestial Refernece System (ICRS) but is still
used extensively. ICRS is defined based on extra-galactic
quasars, so it does not depend on the dynamics of the solar
system any more. But to enable historical continuity, ICRS
has been defined to be equivalent to the equatorial J2000.0
within its accepted error bars of the latter (tens of
milli-arcseconds). This justifies the reason that moving to
ICRS has been relatively slow.
‘eq-b1950’
1950.0 (Besselian-year) equatorial coordinates.
‘ec-j2000’
2000.0 (Julian-year) ecliptic coordinates.
‘ec-b1950’
1950.0 (Besselian-year) ecliptic coordinates.
‘galactic’
Galactic coordinates.
‘supergalactic’
Supergalactic coordinates.
‘--wcsdistortion=STR’
If the argument has a WCS distortion, the output (file given with
the ‘--output’ option) will have the distortion given to this
option (for example, ‘SIP’, ‘TPV’). The output will be a new file
(with a copy of the image, and the new WCS), so if it already
exists, the file will be delete (unless you use the ‘--dontdelete’
option, see *note Input output options::).
With this option, the Fits program will read the minimal set of
keywords from the input HDU and the HDU data. It will then write
them into the file given to the ‘--output’ option but with a newly
created set of WCS-related keywords corresponding to the desired
distortion standard.
If no ‘--output’ file is specified, an automatically generated
output name will be used which is composed of the input's name but
with the ‘-DDD.fits’ suffix, see *note Automatic output::. Where
‘DDD’ is the value given to this option (desired output
distortion).
Note that all possible conversions between all standards are not
yet supported. If the requested conversion is not supported, an
informative error message will be printed. If this happens, please
let us know and we will try our best to add the respective
conversions.
For example, with the command below, you can be sure that if
‘in.fits’ has a distortion in its WCS, the distortion of ‘out.fits’
will be in the SIP standard.
$ astfits in.fits --wcsdistortion=SIP --output=out.fits
---------- Footnotes ----------
(1) Some tricky situations arise with values like '‘87095e5’', if
this was intended to be a number it will be kept in the header as
‘8709500000’ and there is no problem. But this can also be a shortened
Git commit hash. In the latter case, it should be treated as a string
and stored as it is written. Commit hashes are very important in
keeping the history of a file during your research and such values might
arise without you noticing them in your reproduction pipeline. One
solution is to use ‘git describe’ instead of the short hash alone. A
less recommended solution is to add a space after the commit hash and
Fits will write the value as '‘87095e5 ’' in the header. If you later
compare the strings on the shell, the space character will be ignored by
the shell in the latter solution and there will be no problem.
(2) <https://fits.gsfc.nasa.gov/standard40/fits_standard40aa-le.pdf>
File: gnuastro.info, Node: Pixel information images, Prev: Keyword inspection and manipulation, Up: Invoking astfits
5.1.1.3 Pixel information images
................................
In *note Keyword inspection and manipulation:: options like
‘--pixelscale’ were introduced for information on the pixels from the
keywords. But that only provides a single value for all the pixels!
This will not be sufficient in some scenarios; for example due to
distortion, different regions of the image will have different pixel
areas when projected onto the sky.
The options in this section provide such "meta" images: images where
the pixel values are information about the pixel itself. Such images
can be useful in understanding the underlying pixel grid with the same
tools that you study the astronomical objects within the image (like
*note SAO DS9::). After all, nothing beats visual inspection with tools
you are familiar with.
‘--pixelareaonwcs’
Create a meta-image where each pixel's value shows its area in the
WCS units (usually degrees squared). The output is therefore the
same size as the input.
This option uses the same "pixel mixing" or "area resampling"
concept that is described in *note Resampling:: (as part of the
Warp program). Similar to Warp, its sampling can be tuned with the
‘--edgesampling’ that is described below.
One scenario where this option becomes handy is when you are
debugging aligned images using the Warp program (see *note Warp::).
You may observe gradients after warping and can check if they
caused by the distortion of the instrument or not. Such gradients
can happen due to distortions because the detectors pixels are
measuring photons from different areas on the sky (or the type of
projection you're seeing). This effect is more pronounced in
images covering larger portions of the sky, for instance, the TESS
images(1).
Here is an example usage of the ‘--pixelareaonwcs’ option:
# Check the area each 'input.fits' pixel takes in sky
$ astfits input.fits -h1 --pixelareaonwcs -o pixarea.fits
# Convert each pixel's area to arcsec^2
$ astarithmetic pixarea.fits 3600 3600 x x \
--output=pixarea_arcsec2.fits
# Compare area relative to the actual reported pixel scale
$ pixarea=$(astfits input.fits --pixelscale -q \
| awk '{print $3}')
$ astarithmetic pixarea.fits $pixarea / -o pixarea_rel.fits
‘--edgesampling=INT’
Extra sampling along the pixel edges for ‘--pixelareaonwcs’. The
default value is 0, meaning that only the pixel vertices are used.
Values greater than zero improve the accuracy in the expense of
greater time and memory consumption. With that said, the default
value of zero usually has a good precision unless the given image
has extreme distortions that produce irregular pixel shapes. For
more, see *note Align pixels with WCS considering distortions::).
*Caution:* This option does not "oversample" the output image!
Rather, it makes Warp use more points to calculate the _input_
pixel area. To oversample the output image, set a reasonable
‘--cdelt’ value.
---------- Footnotes ----------
(1) <https://www.nasa.gov/tess-transiting-exoplanet-survey-satellite>
File: gnuastro.info, Node: ConvertType, Next: Table, Prev: Fits, Up: Data containers
5.2 ConvertType
===============
The FITS format used in astronomy was defined mainly for archiving,
transmission, and processing. In other situations, the data might be
useful in other formats. For example, when you are writing a paper or
report, or if you are making slides for a talk, you cannot use a FITS
image. Other image formats should be used. In other cases you might
want your pixel values in a table format as plain text for input to
other programs that do not recognize FITS. ConvertType is created for
such situations. The various types will increase with future updates
and based on need.
The conversion is not only one way (from FITS to other formats), but
two ways (except the EPS and PDF formats(1)). So you can also convert a
JPEG image or text file into a FITS image. Basically, other than
EPS/PDF, you can use any of the recognized formats as different color
channel inputs to get any of the recognized outputs.
Before explaining the options and arguments (in *note Invoking
astconvertt::), we will start with a short discussion on the difference
between raster and vector graphics in *note Raster and Vector
graphics::. In ConvertType, vector graphics are used to add markers
over your originally rasterized data, producing high quality images,
ready to be used in your exciting papers. We will continue with a
description of the recognized files types in *note Recognized file
formats::, followed a short introduction to digital color in *note
Color::. A tutorial on how to add markers over an image is then given
in *note Marking objects for publication:: and we conclude with a LaTeX
based solution to add coordinates over an image.
* Menu:
* Raster and Vector graphics:: Images coming from nature, and the abstract.
* Recognized file formats:: Recognized file formats
* Color:: Some explanations on color.
* Annotations for figure in paper:: Adding coordinates or physical scale.
* Invoking astconvertt:: Options and arguments to ConvertType.
---------- Footnotes ----------
(1) Because EPS and PDF are vector, not raster/pixelated formats
File: gnuastro.info, Node: Raster and Vector graphics, Next: Recognized file formats, Prev: ConvertType, Up: ConvertType
5.2.1 Raster and Vector graphics
--------------------------------
Images that are produced by a hardware (for example, the camera in your
phone, or the camera connected to a telescope) provide pixelated data.
Such data are therefore stored in a Raster graphics
(https://en.wikipedia.org/wiki/Raster_graphics) format which has
discrete, independent, equally spaced data elements. For example, this
is the format used FITS (see *note Fits::), JPEG, TIFF, PNG and other
image formats.
On the other hand, when something is generated by the computer (for
example, a diagram, plot or even adding a cross over a camera image to
highlight something there), there is no "observation" or connection with
nature! Everything is abstract! For such things, it is much easier to
draw a mathematical line (with infinite resolution). Therefore, no
matter how much you zoom-in, it will never get pixelated. This is the
realm of Vector graphics
(https://en.wikipedia.org/wiki/Vector_graphics). If you open the
Gnuastro manual in PDF format
(https://www.gnu.org/software/gnuastro/manual/gnuastro.pdf) You can see
such graphics in the Gnuastro manual, for example, in *note Circles and
the complex plane:: or *note Distance on a 2D curved space::. The most
common vector graphics format is PDF for document sharing or SVG for
web-based applications.
The pixels of a raster image can be shown as vector-based squares
with different shades, so vector graphics can generally also support
raster graphics. This is very useful when you want to add some graphics
over an image to help your discussion (for example a $+$ over your
object of interest). However, vector graphics is not optimized for
rasterized data (which are usually also noisy!), and can either not
display nicely, or result in much larger file volume (in bytes).
Therefore, if it is not necessary to add any marks over a FITS image,
for example, it may be better to store it in a rasterized format.
The distinction between the vector and raster graphics is also the
primary theme behind Gnuastro's logo, see *note Logo of Gnuastro::.
File: gnuastro.info, Node: Recognized file formats, Next: Color, Prev: Raster and Vector graphics, Up: ConvertType
5.2.2 Recognized file formats
-----------------------------
The various standards and the file name extensions recognized by
ConvertType are listed below. For a review on the difference between
Raster and Vector graphics, see *note Raster and Vector graphics::. For
a review on the concept of color and channels, see *note Color::.
Currently, except for the FITS format, Gnuastro uses the file name's
suffix to identify the format, so if the file's name does not end with
one of the suffixes mentioned below, it will not be recognized.
FITS or IMH
Astronomical data are commonly stored in the FITS format (or the
older data IRAF ‘.imh’ format), a list of file name suffixes which
indicate that the file is in this format is given in *note
Arguments::. FITS is a raster graphics format.
Each image extension of a FITS file only has one value per
pixel/element. Therefore, when used as input, each input FITS
image contributes as one color channel. If you want multiple
extensions in one FITS file for different color channels, you have
to repeat the file name multiple times and use the ‘--hdu’,
‘--hdu2’, ‘--hdu3’ or ‘--hdu4’ options to specify the different
extensions.
JPEG
The JPEG standard was created by the Joint photographic experts
group. It is currently one of the most commonly used image
formats. Its major advantage is the compression algorithm that is
defined by the standard. Like the FITS standard, this is a raster
graphics format, which means that it is pixelated.
A JPEG file can have 1 (for gray-scale), 3 (for RGB) and 4 (for
CMYK) color channels. If you only want to convert one JPEG image
into other formats, there is no problem, however, if you want to
use it in combination with other input files, make sure that the
final number of color channels does not exceed four. If it does,
then ConvertType will abort and notify you.
The file name endings that are recognized as a JPEG file for input
are: ‘.jpg’, ‘.JPG’, ‘.jpeg’, ‘.JPEG’, ‘.jpe’, ‘.jif’, ‘.jfif’ and
‘.jfi’.
TIFF
TIFF (or Tagged Image File Format) was originally designed as a
common format for scanners in the early 90s and since then it has
grown to become very general. In many aspects, the TIFF standard
is similar to the FITS image standard: it can allow data of many
types (see *note Numeric data types::), and also allows multiple
images to be stored in a single file (like a FITS extension: each
image in the file is called a 'directory' in the TIFF standard).
However, unlike FITS, it can only store images, it has no
constructs for tables. Also unlike FITS, each 'directory' of a
TIFF file can have a multi-channel (e.g., RGB) image. Another
(inconvenient) difference with the FITS standard is that keyword
names are stored as numbers, not human-readable text.
However, outside of astronomy, because of its support of different
numeric data types, many fields use TIFF images for accurate (for
example, 16-bit integer or floating point for example) imaging
data.
EPS
The Encapsulated PostScript (EPS) format is essentially a one page
PostScript file which has a specified size. Postscript is used to
store a full document like this whole Gnuastro book. PostScript
therefore also includes non-image data, for example, lines and
texts. It is a fully functional programming language to describe a
document. A PostScript file is a plain text file that can be
edited like any program source with any plain-text editor.
Therefore in ConvertType, EPS is only an output format and cannot
be used as input. Contrary to the FITS or JPEG formats, PostScript
is not a raster format, but is categorized as vector graphics.
With these features in mind, you can see that when you are
compiling a document with TeX or LaTeX, using an EPS file is much
more low level than a JPEG and thus you have much greater control
and therefore quality. Since it also includes vector graphic lines
we also use such lines to make a thin border around the image to
make its appearance in the document much better. Furthermore,
through EPS, you can add marks over the image in many shapes and
colors. No matter the resolution of the display or printer, these
lines will always be clear and not pixelated. However, this can be
done better with tools within TeX or LaTeX such as PGF/Tikz(1).
If the final input image (possibly after all operations on the flux
explained below) is a binary image or only has two colors of black
and white (in segmentation maps for example), then PostScript has
another great advantage compared to other formats. It allows for 1
bit pixels (pixels with a value of 0 or 1), this can decrease the
output file size by 8 times. So if a gray-scale image is binary,
ConvertType will exploit this property in the EPS and PDF (see
below) outputs.
The standard formats for an EPS file are ‘.eps’, ‘.EPS’, ‘.epsf’
and ‘.epsi’. The EPS outputs of ConvertType have the ‘.eps’
suffix.
PDF
The Portable Document Format (PDF) is currently the most common
format for documents. It is a vector graphics format, allowing
abstract constructs like marks or borders.
The PDF format is based on Postscript, so it shares all the
features mentioned above for EPS. To be able to display it is
programmed content or print, a Postscript file needs to pass
through a processor or compiler. A PDF file can be thought of as
the processed output of the PostScript compiler. PostScript, EPS
and PDF were created and are registered by Adobe Systems.
As explained under EPS above, a PDF document is a static document
description format, viewing its result is therefore much faster and
more efficient than PostScript. To create a PDF output,
ConvertType will make an EPS file and convert that to PDF using GPL
Ghostscript. The suffixes recognized for a PDF file are: ‘.pdf’,
‘.PDF’. If GPL Ghostscript cannot be run on the PostScript file,
The EPS will remain and a warning will be printed (see *note
Optional dependencies::).
‘blank’
This is not actually a file type! But can be used to fill one
color channel with a blank value. If this argument is given for
any color channel, that channel will not be used in the output.
Plain text
The value of each pixel in a 2D image can be written as a 2D matrix
in a plain-text file. Therefore, for the purpose of ConvertType,
plain-text files are a single-channel raster graphics file format.
Plain text files have the advantage that they can be viewed with
any text editor or on the command-line. Most programs also support
input as plain text files. As input, each plain text file is
considered to contain one color channel.
In ConvertType, the recognized extensions for plain text files are
‘.txt’ and ‘.dat’. As described in *note Invoking astconvertt::,
if you just give these extensions, (and not a full filename) as
output, then automatic output will be preformed to determine the
final output name (see *note Automatic output::). Besides these,
when the format of a file cannot be recognized from its name,
ConvertType will fall back to plain text mode. So you can use any
name (even without an extension) for a plain text input or output.
Just note that when the suffix is not recognized, automatic output
will not be preformed.
The basic input/output on plain text images is very similar to how
tables are read/written as described in *note Gnuastro text table
format::. Simply put, the restrictions are very loose, and there
is a convention to define a name, units, data type (see *note
Numeric data types::), and comments for the data in a commented
line. The only difference is that as a table, a text file can
contain many datasets (columns), but as a 2D image, it can only
contain one dataset. As a result, only one information comment
line is necessary for a 2D image, and instead of the starting '‘#
Column N’' (‘N’ is the column number), the information line for a
2D image must start with '‘# Image 1’'. When ConvertType is asked
to output to plain text file, this information comment line is
written before the image pixel values.
When converting an image to plain text, consider the fact that if
the image is large, the number of columns in each line will become
very large, possibly making it very hard to open in some text
editors.
Standard output (command-line)
This is very similar to the plain text output, but instead of
creating a file to keep the printed values, they are printed on the
command-line. This can be very useful when you want to redirect
the results directly to another program in one command with no
intermediate file. The only difference is that only the pixel
values are printed (with no information comment line). To print to
the standard output, set the output name to '‘stdout’'.
---------- Footnotes ----------
(1) <http://sourceforge.net/projects/pgf/>
File: gnuastro.info, Node: Color, Next: Annotations for figure in paper, Prev: Recognized file formats, Up: ConvertType
5.2.3 Color
-----------
Color is generally defined after mixing various data "channels". The
values for each channel usually come a filter that is placed in the
optical path. Filters, only allow a certain window of the spectrum to
pass (for example, the SDSS _r_ filter only allows light from about 5500
to 7000 Angstroms). In digital monitors or common digital cameras, a
different set of filters are used: Red, Green and Blue (commonly known
as RGB) that are more optimized to the eye's perception. On the other
hand, when printing on paper, standard printers use the cyan, magenta,
yellow and key (CMYK, key=black) color space.
* Menu:
* Pixel colors:: Multiple filters in each pixel.
* Colormaps for single-channel pixels:: Better display of single-filter images.
* Vector graphics colors::
File: gnuastro.info, Node: Pixel colors, Next: Colormaps for single-channel pixels, Prev: Color, Up: Color
5.2.3.1 Pixel colors
....................
As discussed in *note Color::, for each displayed/printed pixel of a
color image, the dataset/image has three or four values. To store/show
the three values for each pixel, cameras and monitors allocate a certain
fraction of each pixel's area to red, green and blue filters. These
three filters are thus built into the hardware at the pixel level.
However, because measurement accuracy is very important in scientific
instruments, and we want to do measurements (take images) with
various/custom filters (without having to order a new expensive
detector!), scientific detectors use the full area of the pixel to store
one value for it in a single/mono channel dataset. To make measurements
in different filters, we just place a filter in the light path before
the detector. Therefore, the FITS format that is used to store
astronomical datasets is inherently a mono-channel format (see *note
Recognized file formats:: or *note Fits::).
When a subject has been imaged in multiple filters, you can feed each
different filter into the red, green and blue channels of your monitor
and obtain a false-colored visualization. The reason we say
"false-color" (or pseudo color) is that generally, the three data
channels you provide are not from the same Red, Green and Blue filters
of your monitor! So the observed color on your monitor does not
correspond the physical "color" that you would have seen if you looked
at the object by eye. Nevertheless, it is good (and sometimes
necessary) for visualization (of special features).
In ConvertType, you can do this by giving each separate
single-channel dataset (for example, in the FITS image format) as an
argument (in the proper order), then asking for the output in a format
that supports multi-channel datasets (for example, see the command
below, or *note ConvertType input and output::).
$ astconvertt r.fits g.fits b.fits --output=color.jpg
File: gnuastro.info, Node: Colormaps for single-channel pixels, Next: Vector graphics colors, Prev: Pixel colors, Up: Color
5.2.3.2 Colormaps for single-channel pixels
...........................................
As discussed in *note Pixel colors::, color is not defined when a
dataset/image contains a single value for each pixel. However, we
interact with scientific datasets through monitors or printers. They
allow multiple channels (independent values) per pixel and produce color
with them (on monitors, this is usually with three channels: Red, Green
and Blue). As a result, there is a lot of freedom in visualizing a
single-channel dataset.
The mapping of single-channel values to multi-channel colors is
called called a "color map". Since more information can be put in
multiple channels, this usually results in better visualizing the
dynamic range of your single-channel data. In ConvertType, you can use
the ‘--colormap’ option to choose between different mappings of
mono-channel inputs, see *note Invoking astconvertt::. Below, we will
review two of the basic color maps, please see the description of
‘--colormap’ in *note Invoking astconvertt:: for the full list.
• The most basic colormap is shades of black (because of its strong
contrast with white). This scheme is called Grayscale
(https://en.wikipedia.org/wiki/Grayscale). But ultimately, the
black is just one color, so with Grayscale, you are not using the
full dynamic range of the three-channel monitor effectively. To
help in visualization, more complex mappings can be defined.
• A slightly more complex color map can be defined when you scale the
values to a range of 0 to 360, and use as it as the "Hue" term of
the Hue-Saturation-Value
(https://en.wikipedia.org/wiki/HSL_and_HSV) (HSV) color space
(while fixing the "Saturation" and "Value" terms). The increased
usage of the monitor's 3-channel color space is indeed better, but
the resulting images can be un-"natural" to the eye.
Since grayscale is a commonly used mapping of single-valued datasets,
we will continue with a closer look at how it is stored. One way to
represent a gray-scale image in different color spaces is to use the
same proportions of the primary colors in each pixel. This is the
common way most FITS image viewers work: for each pixel, they fill all
the channels with the single value. While this is necessary for
displaying a dataset, there are downsides when storing/saving this type
of grayscale visualization (for example, in a paper).
• Three (for RGB) or four (for CMYK) values have to be stored for
every pixel, this makes the output file very heavy (in terms of
bytes).
• If printing, the printing errors of each color channel can make the
printed image slightly more blurred than it actually is.
To solve both these problems when storing grayscale visualization,
the best way is to save a single-channel dataset into the black channel
of the CMYK color space. The JPEG standard is the only common standard
that accepts CMYK color space.
The JPEG and EPS standards set two sizes for the number of bits in
each channel: 8-bit and 12-bit. The former is by far the most common
and is what is used in ConvertType. Therefore, each channel should have
values between 0 to 2^8-1=255. From this we see how each pixel in a
gray-scale image is one byte (8 bits) long, in an RGB image, it is 3
bytes long and in CMYK it is 4 bytes long. But thanks to the JPEG
compression algorithms, when all the pixels of one channel have the same
value, that channel is compressed to one pixel. Therefore a Grayscale
image and a CMYK image that has only the K-channel filled are
approximately the same file size.
File: gnuastro.info, Node: Vector graphics colors, Prev: Colormaps for single-channel pixels, Up: Color
5.2.3.3 Vector graphics colors
..............................
When creating vector graphics, ConvertType recognizes the extended web
colors (https://en.wikipedia.org/wiki/Web_colors#Extended_colors) that
are the result of merging the colors in the HTML 4.01, CSS 2.0, SVG 1.0
and CSS3 standards. They are all shown with their standard name in
*note Figure 5.1: colornames. The names are not case sensitive so you
can use them in any form (for example, ‘turquoise’ is the same as
‘Turquoise’ or ‘TURQUOISE’).
On the command-line, you can also get the list of colors with the
‘--listcolors’ option to CovertType, like below. In particular, if your
terminal is 24-bit or "true color", in the last column, you will see
each color. This greatly helps in selecting the best color for our
purpose easily on the command-line (without taking your hands off the
keyboard and getting distracted).
$ astconvertt --listcolors
[image src="gnuastro-figures/color-names.png" text="../gnuastro-figures//color-names.eps"