gnuastro (0.22)

Browse

Build Log

Usage

This is gnuastro.info, produced by makeinfo version 7.1 from
gnuastro.texi.

This book documents version 0.22 of the GNU Astronomy Utilities
(Gnuastro). Gnuastro provides various programs and libraries for
astronomical data manipulation and analysis.

Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, no Front-Cover Texts, and
no Back-Cover Texts. A copy of the license is included in the
section entitled "GNU Free Documentation License".
INFO-DIR-SECTION Astronomy
START-INFO-DIR-ENTRY
* Gnuastro: (gnuastro). GNU Astronomy Utilities.
* libgnuastro: (gnuastro)Gnuastro library. Full Gnuastro library doc.

* help-gnuastro: (gnuastro)help-gnuastro mailing list. Getting help.

* bug-gnuastro: (gnuastro)Report a bug. How to report bugs

* Arithmetic: (gnuastro)Arithmetic. Arithmetic operations on pixels.
* astarithmetic: (gnuastro)Invoking astarithmetic. Options to Arithmetic.

* BuildProgram: (gnuastro)BuildProgram. Compile and run programs using Gnuastro's library.
* astbuildprog: (gnuastro)Invoking astbuildprog. Options to BuildProgram.

* ConvertType: (gnuastro)ConvertType. Convert different file types.
* astconvertt: (gnuastro)Invoking astconvertt. Options to ConvertType.

* Convolve: (gnuastro)Convolve. Convolve an input file with kernel.
* astconvolve: (gnuastro)Invoking astconvolve. Options to Convolve.

* CosmicCalculator: (gnuastro)CosmicCalculator. For cosmological params.
* astcosmiccal: (gnuastro)Invoking astcosmiccal. Options to CosmicCalculator.

* Crop: (gnuastro)Crop. Crop region(s) from image(s).
* astcrop: (gnuastro)Invoking astcrop. Options to Crop.

* Fits: (gnuastro)Fits. View and manipulate FITS extensions and keywords.
* astfits: (gnuastro)Invoking astfits. Options to Fits.

* MakeCatalog: (gnuastro)MakeCatalog. Make a catalog from labeled image.
* astmkcatalog: (gnuastro)Invoking astmkcatalog. Options to MakeCatalog.

* MakeProfiles: (gnuastro)MakeProfiles. Make mock profiles.
* astmkprof: (gnuastro)Invoking astmkprof. Options to MakeProfiles.

* Match: (gnuastro)Match. Match two separate catalogs.
* astmatch: (gnuastro)Invoking astmatch. Options to Match.

* NoiseChisel: (gnuastro)NoiseChisel. Detect signal in noise.
* astnoisechisel: (gnuastro)Invoking astnoisechisel. Options to NoiseChisel.

* Segment: (gnuastro)Segment. Segment detections based on signal structure.
* astsegment: (gnuastro)Invoking astsegment. Options to Segment.

* Query: (gnuastro)Query. Access remote databases for downloading data.
* astquery: (gnuastro)Invoking astquery. Options to Query.

* Statistics: (gnuastro)Statistics. Get image Statistics.
* aststatistics: (gnuastro)Invoking aststatistics. Options to Statistics.

* Table: (gnuastro)Table. Read and write FITS binary or ASCII tables.
* asttable: (gnuastro)Invoking asttable. Options to Table.

* Warp: (gnuastro)Warp. Warp a dataset to a new grid.
* astwarp: (gnuastro)Invoking astwarp. Options to Warp.

* astscript: (gnuastro)Installed scripts. Gnuastro's installed scripts.
* astscript-ds9-region: (gnuastro)Invoking astscript-ds9-region. Options to this script
* astscript-fits-view: (gnuastro)Invoking astscript-fits-view. Options to this script
* astscript-pointing-simulate: (gnuastro)Invoking astscript-pointing-simulate. Options to this script
* astscript-psf-scale-factor: (gnuastro)Invoking astscript-psf-scale-factor. Options to this script
* astscript-psf-select-stars: (gnuastro)Invoking astscript-psf-select-stars. Options to this script
* astscript-psf-stamp: (gnuastro)Invoking astscript-psf-stamp. Options to this script
* astscript-psf-subtract: (gnuastro)Invoking astscript-psf-subtract. Options to this script
* astscript-psf-unite: (gnuastro)Invoking astscript-psf-unite. Options to this script
* astscript-radial-profile: (gnuastro)Invoking astscript-radial-profile. Options to this script
* astscript-sort-by-night: (gnuastro)Invoking astscript-sort-by-night. Options to this script
* astscript-zeropoint: (gnuastro)Invoking astscript-zeropoint. Options to this script
END-INFO-DIR-ENTRY

File: gnuastro.info, Node: Qsort functions, Next: K-d tree, Prev: Polygons, Up: Gnuastro library

12.3.18 Qsort functions (‘qsort.h’)
-----------------------------------

When sorting a dataset is necessary, the C programming language provides
the ‘qsort’ (Quick sort) function. ‘qsort’ is a generic function which
allows you to sort any kind of data structure (not just a single array
of numbers). To define "greater" and "smaller" (for sorting), ‘qsort’
needs another function, even for simple numerical types. The functions
introduced in this section are to passed onto ‘qsort’.

Note that larger and smaller operators are not defined on NaN
elements. Therefore, if the input array is a floating point type, and
contains NaN values, the relevant functions of this section are going to
put the NaN elements at the end of the list (after the sorted non-NaN
elements), irrespective of the requested sorting order (increasing or
decreasing).

The first class of functions below (with ‘TYPE’ in their names) can
be used for sorting a simple numeric array. Just replace ‘TYPE’ with
the dataset's numeric datatype. The second set of functions can be used
to sort indices (leave the actual numbers untouched). To use the second
set of functions, a global variable or structure are also necessary as
described below.

-- Global variable: gal_qsort_index_single
Pointer to an array (for example, ‘float *’ or ‘int *’) to use as a
reference in ‘gal_qsort_index_single_TYPE_d’ or
‘gal_qsort_index_single_TYPE_i’, see the explanation of these
functions for more. Note that if _more than one_ array is to be
sorted in a multi-threaded operation, these functions will not work
as expected. However, when all the threads just sort the indices
based on a _single array_, this global variable can safely be used
in a multi-threaded scenario.

-- Type (C struct): gal_qsort_index_multi
Structure to get the sorted indices of multiple datasets on
multiple threads with ‘gal_qsort_index_multi_d’ or
‘gal_qsort_index_multi_i’. Note that the ‘values’ array will not
be changed by these functions, it is only read. Therefore all the
‘values’ elements in the (to be sorted) array of
‘gal_qsort_index_multi’ must point to the same place.

struct gal_qsort_index_multi
{
float *values; /* Array of values (same in all). */
size_t index; /* Index of each element to be sorted. */
};

-- Function:
int
gal_qsort_TYPE_d (const void *a, const void *b)
When passed to ‘qsort’, this function will sort a ‘TYPE’ array in
decreasing order (first element will be the largest). Please
replace ‘TYPE’ (in the function name) with one of the *note Numeric
data types::, for example, ‘gal_qsort_int32_d’, or
‘gal_qsort_float64_d’.

-- Function:
int
gal_qsort_TYPE_i (const void *a, const void *b)
When passed to ‘qsort’, this function will sort a ‘TYPE’ array in
increasing order (first element will be the smallest). Please
replace ‘TYPE’ (in the function name) with one of the *note Numeric
data types::, for example, ‘gal_qsort_int32_i’, or
‘gal_qsort_float64_i’.

-- Function:
int
gal_qsort_index_single_TYPE_d (const void *a, const void *b)
When passed to ‘qsort’, this function will sort a ‘size_t’ array
based on decreasing values in the ‘gal_qsort_index_single’. The
global ‘gal_qsort_index_single’ pointer has a ‘void *’ pointer
which will be cast to the proper type based on this function: for
example ‘gal_qsort_index_single_uint16_d’ will cast the array to an
unsigned 16-bit integer type. The array that
‘gal_qsort_index_single’ points to will not be changed, it is only
read. For example, see this demo program:

#include <stdio.h>
#include <stdlib.h> /* qsort is defined in stdlib.h. */
#include <gnuastro/qsort.h>

int
main (void)
{
size_t s[4]={0, 1, 2, 3};
float f[4]={1.3,0.2,1.8,0.1};
gal_qsort_index_single=f;
qsort(s, 4, sizeof(size_t), gal_qsort_index_single_float_d);
printf("%zu, %zu, %zu, %zu\n", s[0], s[1], s[2], s[3]);
return EXIT_SUCCESS;
}

The output will be: ‘2, 0, 1, 3’.

-- Function:
int
gal_qsort_index_single_TYPE_i (const void *a, const void *b)
Similar to ‘gal_qsort_index_single_TYPE_d’, but will sort the
indexes such that the values of ‘gal_qsort_index_single’ can be
parsed in increasing order.

-- Function:
int
gal_qsort_index_multi_d (const void *a, const void *b)
When passed to ‘qsort’ with an array of ‘gal_qsort_index_multi’,
this function will sort the array based on the values of the given
indices. The sorting will be ordered according to the ‘values’
pointer of ‘gal_qsort_index_multi’. Note that ‘values’ must point
to the same place in all the structures of the
‘gal_qsort_index_multi’ array.

This function is only useful when the indices of multiple arrays on
multiple threads are to be sorted. If your program is single
threaded, or all the indices belong to a single array (sorting
different sub-sets of indices in a single array on multiple
threads), it is recommended to use ‘gal_qsort_index_single_d’.

-- Function:
int
gal_qsort_index_multi_i (const void *a, const void *b)
Similar to ‘gal_qsort_index_multi_d’, but the result will be sorted
in increasing order (first element will have the smallest value).

File: gnuastro.info, Node: K-d tree, Next: Permutations, Prev: Qsort functions, Up: Gnuastro library

12.3.19 K-d tree (‘kdtree.h’)
-----------------------------

K-d tree is a space-partitioning binary search tree for organizing
points in a k-dimensional space. They are a very useful data structure
for multidimensional searches like range searches and nearest neighbor
searches. For a more formal and complete introduction see the Wikipedia
page (https://en.wikipedia.org/wiki/K-d_tree).

Each non-leaf node in a k-d tree divides the space into two parts,
known as half-spaces. To select the top/root node for partitioning, we
find the median of the points and make a hyperplane normal to the first
dimension. The points to the left of this space are represented by the
left subtree of that node and points to the right of the space are
represented by the right subtree. This is then repeated for all the
points in the input, thus associating a "left" and "right" branch for
each input point.

Gnuastro uses the standard algorithms of the k-d tree with one small
difference that makes it much more memory and CPU optimized. The set of
input points that define the tree nodes are given as a list of
Gnuastro's data container type, see *note List of gal_data_t::. Each
‘gal_data_t’ in the list represents the point's coordinate in one
dimension, and the first element in the list is the first dimension.
Hence the number of data values in each ‘gal_data_t’ (which must be
equal in all of them) represents the number of points. This is the same
format that Gnuastro's Table reading/writing functions read/write
columns in tables, see *note Table input output::.

The output k-d tree is a list of two ‘gal_data_t’s, representing the
input's row-number (or index, counting from 0) of the left and right
subtrees of each row. Each ‘gal_data_t’ thus has the same number of
rows (or points) as the input, but only containing integers with a type
of ‘uint32_t’ (unsigned 32-bit integer). If a node has no left, or
right subtree, then ‘GAL_BLANK_UINT32’ will be used. Below you can see
the simple tree for 2D points from Wikipedia. The input point
coordinates are represented as two input ‘gal_data_t’s (‘X’ and ‘Y’,
where ‘X->next=Y’ and ‘Y->next=NULL’). If you had three dimensional
points, you could define an extra ‘gal_data_t’ such that ‘Y->next=Z’ and
‘Z->next=NULL’. The output is always a list of two ‘gal_data_t’s, where
the first one contains the index of the left sub-tree in the input, and
the second one, the index of the right subtree. The index of the root
node (‘0’ in the case below(1)) is also returned as a single number.

INDEX INPUT OUTPUT K-D Tree
(as guide) X --> Y LEFT --> RIGHT (visualized)
---------- ------- -------------- ------------------
0 5 4 1 2 (5,4)
1 2 3 BLANK 4 / \
2 7 2 5 3 (2,3) \
3 9 6 BLANK BLANK \ (7,2)
4 4 7 BLANK BLANK (4,7) / \
5 8 1 BLANK BLANK (8,1) (9,6)

This format is therefore scalable to any number of dimensions: the
number of dimensions are determined from the number of nodes in the
input list of ‘gal_data_t’s (for example, using ‘gal_list_data_number’).
In Gnuastro's k-d tree implementation, there are thus no special
structures to keep every tree node (which would take extra memory and
would need to be moved around as the tree is being created). Everything
is done internally on the index of each point in the input dataset: the
only thing that is flipped/sorted during tree creation is the index to
the input row for any number of dimensions. As a result, Gnuastro's k-d
tree implementation is very memory and CPU efficient and its two output
columns can directly be written into a standard table (without having to
define any special binary format).

-- Function:
gal_data_t *
gal_kdtree_create (gal_data_t *coords_raw, size_t *root)
Create a k-d tree in a bottom-up manner (from leaves to the root).
This function returns two ‘gal_data_t’s connected as a list, see
description above. The first dataset contains the indexes of left
and right nodes of the subtrees for each input node. The index of
the root node is written into the memory that ‘root’ points to.
‘coords_raw’ is the list of the input points (one ‘gal_data_t’ per
dimension, see above). If the input dataset has no data
(‘coords_raw->size==0’), this function will return a ‘NULL’
pointer.

For example, assume you have the simple set of points below (from
the visualized example at the start of this section) in a
plain-text file called ‘coordinates.txt’:

$ cat coordinates.txt
5 4
2 3
7 2
9 6
4 7
8 1

With the program below, you can calculate the kd-tree, and write it
in a FITS file (while keeping the root index as a FITS keyword
inside of it).

#include <stdio.h>
#include <gnuastro/table.h>
#include <gnuastro/kdtree.h>

int
main (void)
{
gal_data_t *input, *kdtree;
char kdtreefile[]="kd-tree.fits";
char inputfile[]="coordinates.txt";

/* To write the root within the saved file. */
size_t root;
char *unit="index";
char *keyname="KDTROOT";
gal_fits_list_key_t *keylist=NULL;
char *comment="k-d tree root index (counting from 0).";

/* Read the input table. Note: this assumes the table only
* contains your input point coordinates (one column for each
* dimension). If it contains more columns with other properties
* for each point, you can specify which columns to read by
* name or number, see the documentation of 'gal_table_read'. */
input=gal_table_read(inputfile, "1", NULL, NULL,
GAL_TABLE_SEARCH_NAME, 0, -1, 0, NULL);

/* Construct a k-d tree. The index of root is stored in `root` */
kdtree=gal_kdtree_create(input, &root);

/* Write the k-d tree to a file and write root index and input
* name as FITS keywords ('gal_table_write' frees 'keylist').*/
gal_fits_key_list_title_add(&keylist, "k-d tree parameters", 0);
gal_fits_key_write_filename("KDTIN", inputfile, &keylist, 0, 1);
gal_fits_key_list_add_end(&keylist, GAL_TYPE_SIZE_T, keyname, 0,
&root, 0, comment, 0, unit, 0);
gal_table_write(kdtree, &keylist, NULL, GAL_TABLE_FORMAT_BFITS,
kdtreefile, "kdtree", 0, 1);

/* Clean up and return. */
gal_list_data_free(input);
gal_list_data_free(kdtree);
return EXIT_SUCCESS;
}

You can inspect the saved k-d tree FITS table with Gnuastro's *note
Table:: (first command below), and you can see the keywords
containing the root index with *note Fits:: (second command below):

asttable kd-tree.fits
astfits kd-tree.fits -h1

-- Function:
size_t
gal_kdtree_nearest_neighbour (gal_data_t *coords_raw,
gal_data_t *kdtree, size_t root, double *point, double
*least_dist)
Returns the index of the nearest input point to the query point
(‘point’, assumed to be an array with same number of elements as
‘gal_data_t’s in ‘coords_raw’). The distance between the query
point and its nearest neighbor is stored in the space that
‘least_dist’ points to. This search is efficient due to the
constant checking for the presence of possible best points in other
branches. If it is not possible for the other branch to have a
better nearest neighbor, that branch is not searched.

As an example, let's use the k-d tree that was created in the
example of ‘gal_kdtree_create’ (above) and find the nearest row to
a given coordinate (‘point’). This will be a very common scenario,
especially in large and multi-dimensional datasets where the k-d
tree creation can take long and you do not want to re-create the
k-d tree every time. In the ‘gal_kdtree_create’ example output, we
also wrote the k-d tree root index as a FITS keyword (‘KDTROOT’),
so after loading the two table data (input coordinates and k-d
tree), we will read the root from the FITS keyword. This is a very
simple example, but the scalability is clear: for example, it is
trivial to parallelize (see *note Library demo - multi-threaded
operation::).

#include <stdio.h>
#include <gnuastro/table.h>
#include <gnuastro/kdtree.h>

int
main (void)
{
/* INPUT: desired point. */
double point[2]={8.9,5.9};

/* Same as example in description of 'gal_kdtree_create'. */
gal_data_t *input, *kdtree;
char kdtreefile[]="kd-tree.fits";
char inputfile[]="coordinates.txt";

/* Processing variables of this function. */
char kdtreehdu[]="1";
double *in_x, *in_y, least_dist;
size_t root, nkeys=1, nearest_index;
gal_data_t *rkey, *keysll=gal_data_array_calloc(nkeys);

/* Read the input coordinates, see comments in example of
* 'gal_kdtree_create' for more. */
input=gal_table_read(inputfile, "1", NULL, NULL,
GAL_TABLE_SEARCH_NAME, 0, -1, 0, NULL);

/* Read the k-d tree contents (created before). */
kdtree=gal_table_read(kdtreefile, "1", NULL, NULL,
GAL_TABLE_SEARCH_NAME, 0, -1, 0, NULL);

/* Read the k-d tree root index from the header keyword.
* See example in description of 'gal_fits_key_read_from_ptr'.*/
keysll[0].name="KDTROOT";
keysll[0].type=GAL_TYPE_SIZE_T;
gal_fits_key_read(kdtreefile, kdtreehdu, keysll, 0, 0, NULL);
keysll[0].name=NULL; /* Since we did not allocate it. */
rkey=gal_data_copy_to_new_type(&keysll[0], GAL_TYPE_SIZE_T);
root=((size_t *)(rkey->array))[0];

/* Find the nearest neighbour of the point. */
nearest_index=gal_kdtree_nearest_neighbour(input, kdtree, root,
point, &least_dist);

/* Print the results. */
in_x=input->array;
in_y=input->next->array;
printf("(%g, %g): nearest is (%g, %g), with a distance of %g\n",
point[0], point[1], in_x[nearest_index],
in_y[nearest_index], least_dist);

/* Clean up and return. */
gal_data_free(rkey);
gal_list_data_free(input);
gal_list_data_free(kdtree);
gal_data_array_free(keysll, nkeys, 1);
return EXIT_SUCCESS;
}

---------- Footnotes ----------

(1) This example input table is the same as the example in Wikipedia
(as of December 2020). However, on the Wikipedia output, the root node
is (7,2), not (5,4). The difference is primarily because there are 6
rows and the median element of an even number of elements can vary by
integer calculation strategies. Here we use 0-based indexes for finding
median and round to the smaller integer.

File: gnuastro.info, Node: Permutations, Next: Matching, Prev: K-d tree, Up: Gnuastro library

12.3.20 Permutations (‘permutation.h’)
--------------------------------------

Permutation is the technical name for re-ordering of values. The need
for permutations occurs a lot during (mainly low-level) processing. To
do permutation, you must provide two inputs: an array of values (that
you want to re-order in place) and a permutation array which contains
the new index of each element (let's call it ‘perm’). The diagram below
shows the input array before and after the re-ordering.

permute: AFTER[ i ] = BEFORE[ perm[i] ] i = 0 .. N-1
inverse: AFTER[ perm[i] ] = BEFORE[ i ] i = 0 .. N-1

The functions here are a re-implementation of the GNU Scientific
Library's ‘gsl_permute’ function. The reason we did not use that
function was that it uses system-specific types (like ‘long’ and ‘int’)
which can have different widths on different systems, hence are not
easily convertible to Gnuastro's fixed width types (see *note Numeric
data types::). There is also a separate function for each type, heavily
using macros to allow a ‘base’ function to work on all the types. Thus
it is hard to read/understand. Hence, Gnuastro contains a re-write of
their steps in a new type-agnostic method which is a single function
that can work on any type.

As described in GSL's source code and manual, this implementation
comes from Donald Knuth's _Art of computer programming_ book, in the
"Sorting and Searching" chapter of Volume 3 (3rd ed). Exercise 10 of
Section 5.2 defines the problem and in the answers, Knuth describes the
solution. So if you are interested, please have a look there for more.

We are in contact with the GSL developers and in the future(1) we
will submit these implementations to GSL. If they are finally
incorporated there, we will delete this section in future versions.

-- Function:
void
gal_permutation_check (size_t *permutation, size_t size)
Print how ‘permutation’ will re-order an array that has ‘size’
elements for each element in one one line.

-- Function:
void
gal_permutation_apply (gal_data_t *input, size_t *permutation)
Apply ‘permutation’ on the ‘input’ dataset (can have any type), see
above for the definition of permutation.

-- Function:
void
gal_permutation_apply_inverse (gal_data_t *input, size_t
*permutation)
Apply the inverse of ‘permutation’ on the ‘input’ dataset (can have
any type), see above for the definition of permutation.

-- Function:
void
gal_permutation_transpose_2d (gal_data_t *input)
Transpose an input 2D matrix into a new dataset. If the input is
not a square, this function will change the ‘input->array’ element
to a newly allocated array (the old one will be freed internally).
Therefore, in case you have already stored ‘input->array’ for other
usage _before_ this function, and the input is not a square, be
sure to update the previously stored pointer if the input is not a
square.

---------- Footnotes ----------

(1) Gnuastro's Task 14497 (http://savannah.gnu.org/task/?14497). If
this task is still "postponed" when you are reading this and you are
interested to help, your contributions would be very welcome. Both
Gnuastro and GSL developers are very busy, hence both would appreciate
your help.

File: gnuastro.info, Node: Matching, Next: Statistical operations, Prev: Permutations, Up: Gnuastro library

12.3.21 Matching (‘match.h’)
----------------------------

Matching is often necessary when two measurements of the same points
have been done using different instruments (or hardware), different
software or different configurations of the same software. In other
words, you have two catalogs or tables, and each has N columns
containing the N-dimensional "coordinate" values of each point. Each
table can have other columns too, for example, one can have magnitudes
in one filter, and another can have morphology measurements.

The matching functions here will use the coordinate columns of the
two tables to find a permutation for each, and the total number of
matched rows ($N_{match}$). This will enable you to match by the
positions if you like. At a higher level, you can apply the permutation
to the magnitude or morphology columns to merge the catalogs over the
$N_{match}$ rows. The input and output data formats of the functions
are the some and described below before the actual functions. Each
function also has extra arguments due to the particular algorithm it
uses for the matching.

The two inputs of the functions (‘coord1’ and ‘coord2’) must be *note
List of gal_data_t::. Each ‘gal_data_t’ node in ‘coord1’ or ‘coord2’
should be a single dimensional dataset (column in a table) and all the
nodes (in each) must have the same number of elements (rows). In other
words, each column can be visualized as having the coordinates of each
point in its respective dimension. The dimensions of the coordinates is
determined by the number of ‘gal_data_t’ nodes in the two input lists
(which must be equal). The number of rows (or the number of elements in
each ‘gal_data_t’) in the columns of ‘coord1’ and ‘coord2’ can (and,
usually will!) be different. In summary, these functions will be happy
if you use ‘gal_table_read’ to read the two coordinate columns from a
file, see *note Table input output::.

The functions below return a simply-linked list of three 1D datasets
(see *note List of gal_data_t::), let's call the returned dataset ‘ret’.
The first two (‘ret’ and ‘ret->next’) are permutations. In other words,
the ‘array’ elements of both have a type of ‘size_t’, see *note
Permutations::. The third node (‘ret->next->next’) is the calculated
distance for that match and its array has a type of ‘double’. The
number of matches will be put in the space pointed by the ‘nummatched’
argument. If there was not any match, this function will return ‘NULL’.

The two permutations can be applied to the rows of the two inputs:
the first one (‘ret’) should be applied to the rows of the table
containing ‘coord1’ and the second one (‘ret->next’) to the table
containing ‘coord2’. After applying the returned permutations to the
inputs, the top ‘nummatched’ elements of both will match with each
other. The ordering of the rest of the elements is undefined (depends
on the matching function used). The third node is the distances between
the respective match (which may be elliptical distance, see discussion
of "aperture" below).

The functions will not simply return the nearest neighbor as a match.
This is because the nearest neighbor may be too far to be a meaningful!
They will check the distance between the nearest neighbor of each point
and only return a match if it is within an acceptable N-dimensional
distance (or "aperture"). The matching aperture is defined by the
‘aperture’ array that is an input argument to the functions.

If several points of one catalog lie within this aperture of a point
in the other catalog, the nearest is defined as the match. In a 2D
situation (where the input lists have two nodes), for the most generic
case, ‘aperture’ must have three elements: the major axis length, axis
ratio and position angle (see *note Defining an ellipse and
ellipsoid::). If ‘aperture[1]==1’, the aperture will be a circle of
radius ‘aperture[0]’ and the third value will not be used. When the
aperture is an ellipse, distances between the points are also calculated
in the respective elliptical distances ($r_{el}$ in *note Defining an
ellipse and ellipsoid::).

*Output permutations ignore internal sorting*: the output
permutations will correspond to the initial inputs. Therefore, even
when ‘inplace!=0’ (and this function re-arranges the inputs in place),
the output permutation will correspond to original (possibly non-sorted)
inputs. The reason for this is that you rarely want to permute the
actual positional columns after the match. Usually, you also have other
columns (such as the magnitude and morphology) and you want to find how
they differ between the objects that match. Once you have the
permutations, they can be applied to those other columns (see *note
Permutations::) and the higher-level processing can continue. So if you
do not need the coordinate columns for the rest of your analysis, it is
better to set ‘inplace=1’.

-- Function:
gal_data_t *
gal_match_sort_based (gal_data_t *coord1, gal_data_t *coord2,
double *aperture, int sorted_by_first, int inplace, size_t
minmapsize, int quietmmap, size_t *nummatched)

Use a basic sort-based match to find the matching points of two
input coordinates. See the descriptions above on the format of the
inputs and outputs. To speed up the search, this function will
sort the input coordinates by their first column (first axis). If
_both_ are already sorted by their first column, you can avoid the
sorting step by giving a non-zero value to ‘sorted_by_first’.

When sorting is necessary and ‘inplace’ is non-zero, the actual
input columns will be sorted. Otherwise, an internal copy of the
inputs will be made, used (sorted) and later freed before
returning. Therefore, when ‘inplace==0’, inputs will remain
untouched, but this function will take more time and memory. If
internal allocation is necessary and the space is larger than
‘minmapsize’, the space will be not allocated in the RAM, but in a
file, see description of ‘--minmapsize’ and ‘--quietmmap’ in *note
Processing options::.

-- Function:
gal_data_t *
gal_match_kdtree (gal_data_t *coord1, gal_data_t *coord2,
gal_data_t *coord1_kdtree, size_t kdtree_root, double
*aperture, size_t numthreads, size_t minmapsize, int
quietmmap, size_t *nummatched)

Use the k-d tree concept for finding matches between two catalogs,
optionally in parallel (on ‘numthreads’ threads). The k-d tree of
the first input (‘coord1_kdtree’), and its root index
(‘kdtree_root’), should be constructed and found before calling
this function, to do this, you can use the ‘gal_kdtree_create’ of
*note K-d tree::. The desired ‘aperture’ array is the same as
‘gal_match_sort_based’ and described at the top of this section.
If ‘coord1_kdtree==NULL’, this function will return a ‘NULL’
pointer and write a value of ‘0’ in the space that ‘nummatched’
points to.

The final number of matches is returned in ‘nummatched’ and the
format of the returned dataset (three columns) is described above.
If internal allocation is necessary and the space is larger than
‘minmapsize’, the space will be not allocated in the RAM, but in a
file, see description of ‘--minmapsize’ and ‘--quietmmap’ in *note
Processing options::.

File: gnuastro.info, Node: Statistical operations, Next: Fitting functions, Prev: Matching, Up: Gnuastro library

12.3.22 Statistical operations (‘statistics.h’)
-----------------------------------------------

After reading a dataset into memory from a file or fully simulating it
with another process, the most common processes that will be done on it
are statistical operations to let you quantify different aspects of the
data. the functions in this section describe Gnuastro's current set of
tools for this job. All these functions can work on any numeric data
type natively (see *note Numeric data types::) and can also work on
tiles over a dataset. Hence the inputs and outputs are in Gnuastro's
*note Generic data container::.

-- Macro: GAL_STATISTICS_SIG_CLIP_MAX_CONVERGE
The maximum number of clips, when $\sigma$-clipping should be done
by convergence. If the clipping does not converge before making
this many clips, all $\sigma$-clipping outputs will be NaN.

-- Macro: GAL_STATISTICS_MODE_GOOD_SYM
The minimum acceptable symmetricity of the mode calculation. If
the symmetricity of the derived mode is less than this value, all
the returned values by ‘gal_statistics_mode’ will have a value of
NaN.

-- Macro: GAL_STATISTICS_BINS_INVALID
-- Macro: GAL_STATISTICS_BINS_REGULAR
-- Macro: GAL_STATISTICS_BINS_IRREGULAR
Macros used to identify if the regularity of the bins when defining
bins.

-- Macro: GAL_STATISTICS_CLIP_OUTCOL_STD
-- Macro: GAL_STATISTICS_CLIP_OUTCOL_MAD
-- Macro: GAL_STATISTICS_CLIP_OUTCOL_MEAN
-- Macro: GAL_STATISTICS_CLIP_OUTCOL_MEDIAN
-- Macro: GAL_STATISTICS_CLIP_OUTCOL_NUMBER_USED
-- Macro: GAL_STATISTICS_CLIP_OUTCOL_NUMBER_CLIPS
Macros containing the index of the clipping outputs, see the
descriptions of ‘gal_statistics_clip_sigma’ below.

-- Macro: GAL_STATISTICS_CLIP_OUTCOL_OPTIONAL_STD
-- Macro: GAL_STATISTICS_CLIP_OUTCOL_OPTIONAL_MAD
-- Macro: GAL_STATISTICS_CLIP_OUTCOL_OPTIONAL_MEAN
Macros containing bit flags for optional clipping outputs, see the
descriptions of ‘gal_statistics_clip_sigma’ below.

-- Function:
gal_data_t *
gal_statistics_number (gal_data_t *input)
Return a single-element dataset with type ‘size_t’ which contains
the number of non-blank elements in ‘input’.

-- Function:
gal_data_t *
gal_statistics_minimum (gal_data_t *input)
Return a single-element dataset containing the minimum non-blank
value in ‘input’. The numerical datatype of the output is the same
as ‘input’.

-- Function:
gal_data_t *
gal_statistics_maximum (gal_data_t *input)
Return a single-element dataset containing the maximum non-blank
value in ‘input’. The numerical datatype of the output is the same
as ‘input’.

-- Function:
gal_data_t *
gal_statistics_sum (gal_data_t *input)
Return a single-element (‘double’ or ‘float64’) dataset containing
the sum of the non-blank values in ‘input’.

-- Function:
gal_data_t *
gal_statistics_mean (gal_data_t *input)
Return a single-element (‘double’ or ‘float64’) dataset containing
the mean of the non-blank values in ‘input’.

-- Function:
gal_data_t *
gal_statistics_std (gal_data_t *input)
Return a single-element (‘double’ or ‘float64’) dataset containing
the standard deviation of the non-blank values in ‘input’.

-- Function:
gal_data_t *
gal_statistics_mean_std (gal_data_t *input)
Return a two-element (‘double’ or ‘float64’) dataset containing the
mean and standard deviation of the non-blank values in ‘input’.
The first element of the returned dataset is the mean and the
second is the standard deviation.

This function will calculate both values in one pass over the
dataset. Hence when both the mean and standard deviation of a
dataset are necessary, this function is much more efficient than
calling ‘gal_statistics_mean’ and ‘gal_statistics_std’ separately.

-- Function:
double
gal_statistics_std_from_sums (double sum, double sump2, size_t
num)
Return the standard deviation from the values that can be obtained
in a single pass through the distribution: ‘sum’: the sum of the
elements, ‘sump2’: the sum of the power-of-2 of each element, and
‘num’: the number of elements.

This is a low-level function that is only useful after the
distribution of values has been parsed (and the three input
arguments are calculated). It is the lower-level function that is
used in functions like ‘gal_statistics_std’, or other components of
Gnuastro that measure the standard deviation (for example,
MakeCatalog's ‘--std’ column).

-- Function:
gal_data_t *
gal_statistics_median (gal_data_t *input, int inplace)
Return a single-element dataset containing the median of the
non-blank values in ‘input’. The numerical datatype of the output
is the same as ‘input’.

Calculating the median involves sorting the dataset and removing
blank values, for better performance (and less memory usage), you
can give a non-zero value to the ‘inplace’ argument. In this case,
the sorting and removal of blank elements will be done directly on
the input dataset. However, after this function the original
dataset may have changed (if it was not sorted or had blank
values).

-- Function:
gal_data_t *
gal_statistics_mad (gal_data_t *input, int inplace)
Return a single-element dataset with same type as input, containing
the median absolute deviation (MAD) of the non-blank values in
‘input’.

If ‘inplace==0’, the input dataset will remain untouched.
Otherwise, the MAD calculation will be done on the input dataset
without allocating a new one (its values will be changed after this
function). This is good when you do not need the input after this
function and avoid taking extra RAM and CPU.

-- Function:
gal_data_t *
gal_statistics_median_mad (gal_data_t *input, int inplace)
Return a two-element dataset with same type as input, containing
the median and median absolute deviation (MAD) of the non-blank
values in ‘input’.

-- Function:
size_t
gal_statistics_quantile_index (size_t size, double quantile)
Return the index of the element that has a quantile of ‘quantile’
assuming the dataset has ‘size’ elements.

-- Function:
gal_data_t *
gal_statistics_quantile (gal_data_t *input, double quantile,
int inplace)
Return a single-element dataset containing the value with in a
quantile ‘quantile’ of the non-blank values in ‘input’. The
numerical datatype of the output is the same as ‘input’. See
‘gal_statistics_median’ for a description of ‘inplace’.

-- Function:
size_t
gal_statistics_quantile_function_index (gal_data_t *input,
gal_data_t *value, int inplace)
Return the index of the quantile function (inverse quantile) of
‘input’ at ‘value’. In other words, this function will return the
index of the nearest element (of a sorted and non-blank) ‘input’ to
‘value’. If the value is outside the range of the input, then this
function will return ‘GAL_BLANK_SIZE_T’.

-- Function:
gal_data_t *
gal_statistics_quantile_function (gal_data_t *input,
gal_data_t *value, int inplace)

Return a single-element dataset containing the quantile function of
the non-blank values in ‘input’ at ‘value’ (a single-element
dataset). The numerical data type is of the returned dataset is
‘float64’ (or ‘double’). In other words, this function will return
the quantile of ‘value’ in ‘input’. ‘value’ has to have the same
type as ‘input’. See ‘gal_statistics_median’ for a description of
‘inplace’.

When all elements are blank, the returned value will be NaN. If the
value is smaller than the input's smallest element, the returned
value will be negative infinity. If the value is larger than the
input's largest element, then the returned value will be positive
infinity

-- Function:
gal_data_t *
gal_statistics_unique (gal_data_t *input, int inplace)
Return a 1D dataset with the same numeric data type as the input,
but only containing its unique elements and without any (possible)
blank/NaN elements. Note that the input's number of dimensions is
irrelevant for this function. If ‘inplace’ is not zero, then the
unique values will over-write the allocated space of the input,
otherwise a new space will be allocated and the input will not be
touched.

-- Function:
int
gal_statistics_has_negative (gal_data_t *input)
Return ‘1’ if the input dataset contains a negative number and ‘0’
otherwise. If the dataset doesn't have a numeric type (as in a
string), this function will abort with, saying that it does not
recognize the file type.

-- Function:
gal_data_t *
gal_statistics_mode (gal_data_t *input, float mirrordist, int
inplace)
Return a four-element (‘double’ or ‘float64’) dataset that contains
the mode of the ‘input’ distribution. This function implements the
non-parametric algorithm to find the mode that is described in
Appendix C of Akhlaghi and Ichikawa 2015
(https://arxiv.org/abs/1505.01664).

In short it compares the actual distribution and its "mirror
distribution" to find the mode. In order to be efficient, you can
determine how far the comparison goes away from the mirror through
the ‘mirrordist’ parameter (think of it as a multiple of
sigma/error). See ‘gal_statistics_median’ for a description of
‘inplace’.

The output array has the following elements (in the given order,
note that counting in C starts from 0).
array[0]: mode
array[1]: mode quantile.
array[2]: symmetricity.
array[3]: value at the end of symmetricity.

-- Function:
gal_data_t *
gal_statistics_mode_mirror_plots (gal_data_t *input,
gal_data_t *value, size_t numbins, int inplace, double
*mirror_val)
Make a mirrored histogram and cumulative frequency plot (with
‘numbins’) with the mirror distribution of the ‘input’ having a
value in ‘value’. If all the input elements are blank, or the
mirror value is outside the range of the input, this function will
return a ‘NULL’ pointer.

The output is a list of data structures (see *note List of
gal_data_t::): the first is the bins with one bin at the mirror
point, the second is the histogram with a maximum of one and the
third is the cumulative frequency plot (with a maximum of one).

-- Function:
int
gal_statistics_is_sorted (gal_data_t *input, int updateflags)
Return ‘0’ if the input is not sorted, if it is sorted, this
function will return ‘1’ and ‘2’ if it is increasing or decreasing,
respectively. This function will abort with an error if ‘input’
has zero elements and will return ‘1’ (sorted, increasing) when
there is only one element. This function will only look into the
dataset if the ‘GAL_DATA_FLAG_SORT_CH’ bit of ‘input->flag’ is ‘0’,
see *note Generic data container::.

When the flags do not indicate a previous check _and_ ‘updateflags’
is non-zero, this function will set the flags appropriately to
avoid having to re-check the dataset in future calls (this can be
very useful when repeated checks are necessary). When
‘updateflags==0’, this function has no side-effects on the dataset:
it will not toggle the flags.

If you want to re-check a dataset with the blank-value-check flag
already set (for example, if you have made changes to it), then
explicitly set the ‘GAL_DATA_FLAG_SORT_CH’ bit to zero before
calling this function. When there are no other flags, you can
simply set the flags to zero (with ‘input->flag=0’), otherwise you
can use this expression:

input->flag &= ~GAL_DATA_FLAG_SORT_CH;

-- Function:
void
gal_statistics_sort_increasing (gal_data_t *input)
Sort the input dataset (in place) in an increasing order and toggle
the sort-related bit flags accordingly.

-- Function:
void
gal_statistics_sort_decreasing (gal_data_t *input)
Sort the input dataset (in place) in a decreasing order and toggle
the sort-related bit flags accordingly.

-- Function:
gal_data_t *
gal_statistics_no_blank_sorted (gal_data_t *input, int
inplace)
Remove all the blanks and sort the input dataset. If ‘inplace’ is
non-zero this will happen on the input dataset (in the allocated
space of the input dataset). However, if ‘inplace’ is zero, this
function will allocate a new copy of the dataset and work on that.
Therefore if ‘inplace==0’, the input dataset will be modified.

This function uses the bit flags of the input, so if you have
modified the dataset, set ‘input->flag=0’ before calling this
function. Also note that ‘inplace’ is only for the dataset
elements. Therefore even when ‘inplace==0’, if the input is
already sorted _and_ has no blank values, then the flags will be
updated to show this.

If all the elements were blank, then the returned dataset's ‘size’
will be zero. This is thus a good parameter to check after calling
this function to see if there actually were any non-blank elements
in the input or not and take the appropriate measure. This can
help avoid strange bugs in later steps. The flags of a zero-sized
returned dataset will indicate that it has no blanks and is sorted
in an increasing order. Even if having blank values or being
sorted is not defined on a zero-element dataset, it is up to the
caller to choose what they will do with a zero-element dataset.
The flags have to be set after this function any way.

-- Function:
gal_data_t *
gal_statistics_regular_bins (gal_data_t *input, gal_data_t
*inrange, size_t numbins, double onebinstart)
Generate an array of regularly spaced elements as a 1D array
(column) of type ‘double’ (i.e., ‘float64’, it has to be double to
account for small differences on the bin edges). The input
arguments are described below

‘input’
The dataset you want to apply the bins to. This is only
necessary if the range argument is not complete, see below.
If ‘inrange’ has all the necessary information, you can pass a
‘NULL’ pointer for this.

‘inrange’
This dataset keeps the desired range along each dimension of
the input data structure, it has to be in ‘float’ (i.e.,
‘float32’) type.

• If you want the full range of the dataset (in any
dimensions, then just set ‘inrange’ to ‘NULL’ and the
range will be specified from the minimum and maximum
value of the dataset (‘input’ cannot be ‘NULL’ in this
case).

• If there is one element for each dimension in range, then
it is viewed as a quantile (Q), and the range will be: 'Q
to 1-Q'.

• If there are two elements for each dimension in range,
then they are assumed to be your desired minimum and
maximum values. When either of the two are NaN, the
minimum and maximum will be calculated for it.

‘numbins’
The number of bins: must be larger than 0.

‘onebinstart’
A desired value to start one bin. Note that with this option,
the bins will not start and end exactly on the given range
values, it will be slightly shifted to accommodate this
request (enough for the bin containing the value to start at
it). If you do not have any preference on where to start a
bin, set this to NAN.

-- Function:
gal_data_t *
gal_statistics_histogram (gal_data_t *input, gal_data_t *bins,
int normalize, int maxone)
Make a histogram of all the elements in the given dataset with bin
values that are defined in the ‘bins’ structure (see
‘gal_statistics_regular_bins’, they currently have to be equally
spaced). The returned histogram is a 1-D ‘gal_data_t’ of type
‘GAL_TYPE_FLOAT32’, with the same number of elements as ‘bins’.
For each bin, it will contain the number of input elements that
fell inside of that bin.

Let's write the center of the $i$th element of the bin array as
$b_i$, and the fixed half-bin width as $h$. Then element $j$ of
the input array ($in_j$) will be counted in $b_i$ if $(b_i-h) \le
in_j < (b_i+h)$. However, if $in_j$ is somewhere in the last bin,
the condition changes to $(b_i-h) \le in_j \le (b_i+h)$.

If ‘normalize!=0’, the histogram will be "normalized" such that the
sum of the counts column will be one. In other words, all the
counts in every bin will be divided by the total number of counts.
If ‘maxone!=0’, the histogram's maximum count will be 1. In other
words, the counts in every bin will be divided by the value of the
maximum. In both of these cases, the output dataset will have a
‘GAL_DATA_FLOAT32’ datatype.

-- Function:
gal_data_t *
gal_statistics_histogram2d (gal_data_t *input, gal_data_t
*bins)
This function is very similar to ‘gal_statistics_histogram’, but
will build a 2D histogram (count how many of the elements of
‘input’ are a within a 2D box. The bins comprising the first
dimension of the 2D box are defined by ‘bins’. The bins of the
second dimension are defined by ‘bins->next’ (‘bins’ is a *note
List of gal_data_t::). Both the ‘bin’ and ‘bin->next’ can be
created with ‘gal_statistics_regular_bins’.

This function returns a list of ‘gal_data_t’ with three
nodes/columns, so you can directly write them into a table (see
*note Table input output::). Assuming ‘bins’ has $N1$ bins and
‘bins->next’ has $N2$ bins, each node/column of the returned output
is a 1D array with $N1\times N2$ elements. The first and second
columns are the center of the 2D bin along the first and second
dimensions and have a ‘double’ data type. The third column is the
2D histogram (the number of input elements that have a value within
that 2D bin) and has a ‘uint32’ data type (see *note Numeric data
types::).

-- Function:
gal_data_t *
gal_statistics_cfp (gal_data_t *input, gal_data_t *bins, int
normalize)
Make a cumulative frequency plot (CFP) of all the elements in
‘input’ with bin values that are defined in the ‘bins’ structure
(see ‘gal_statistics_regular_bins’).

The CFP is built from the histogram: in each bin, the value is the
sum of all previous bins in the histogram. Thus, if you have
already calculated the histogram before calling this function, you
can pass it onto this function as the data structure in
‘bins->next’ (see ‘List of gal_data_t’). If ‘bin->next!=NULL’,
then it is assumed to be the histogram. If it is ‘NULL’, then the
histogram will be calculated internally and freed after the job is
finished.

When a histogram is given and it is normalized, the CFP will also
be normalized (even if the normalized flag is not set here): note
that a normalized CFP's maximum value is 1.

-- Function:
gal_data_t *
gal_statistics_clip_sigma (gal_data_t *input, float multip,
float param, float extrastats, int inplace, int quiet)
Apply $\sigma$-clipping on a given dataset and return a dataset
that contains the results. For a description of $\sigma$-clipping
see *note Sigma clipping::. ‘multip’ is the multiple of the
standard deviation (or $\sigma$, that is used to define outliers in
each round of clipping).

The role of ‘param’ is determined based on its value. If ‘param’
is larger than ‘1’ (one), it must be an integer and will be
interpreted as the number clips to do. If it is less than ‘1’
(one), it is interpreted as the tolerance level to stop the
iteration.

The returned dataset (let's call it ‘out’) contains a 6-element
array with type ‘GAL_TYPE_FLOAT32’. Through the
‘GAL_STATISTICS_CLIP_OUTCOL_*’ macros below, you can access any
particular measurement.

out=gal_statistics_clip_sigma(input, ....);
float *array=out->array;

array[ GAL_STATISTICS_CLIP_OUTCOL_NUMBER_USED ]
array[ GAL_STATISTICS_CLIP_OUTCOL_MEAN ]
array[ GAL_STATISTICS_CLIP_OUTCOL_STD ]
array[ GAL_STATISTICS_CLIP_OUTCOL_MEDIAN ]
array[ GAL_STATISTICS_CLIP_OUTCOL_MAD ]
array[ GAL_STATISTICS_CLIP_OUTCOL_NUMBER_CLIPS ]

However, note that all are not measured by default! Since the mean
and MAD are not necessary during sigma-clipping, if you want them,
you have to set the following two bit flags in the ‘extrastats’
argument as below.

int extrastats=0; /* To initialize all bits */

/* If you want the sigma-clipped MAD. */
extrastats |= GAL_STATISTICS_CLIP_OUTCOL_OPTIONAL_MAD;

/* If you want the sigma-clipped mean. */
extrastats |= GAL_STATISTICS_CLIP_OUTCOL_OPTIONAL_MEAN;

If the $\sigma$-clipping does not converge or all input elements
are blank, then this function will return NaN values for all the
elements above.

-- Function:
gal_data_t *
gal_statistics_clip_mad (gal_data_t *input, float multip,
float param, uint8_t extrastats, int inplace, int quiet)
Similar to ‘gal_statistics_clip_sigma’, but will do median absolute
deviation (MAD) based clipping, see *note MAD clipping::.

The only difference is that for this function the MAD is
automatically calculated during clipping. It is the mean and
standard deviation that will not be calculated unless requested
with the ‘GAL_STATISTICS_CLIP_OUTCOL_OPTIONAL_MEAN’ and
‘GAL_STATISTICS_CLIP_OUTCOL_OPTIONAL_STD’ bit flats respectively.

-- Function:
gal_data_t *
gal_statistics_outlier_bydistance (int pos1_neg0, gal_data_t
*input, size_t window_size, float sigma, float sigclip_multip,
float sigclip_param, int inplace, int quiet)

Find the first positive outlier (if ‘pos1_neg0!=0’) in the ‘input’
distribution. When ‘pos1_neg0==0’, the same algorithm goes to the
start of the dataset. The returned dataset contains a single
element: the first positive outlier. It is one of the dataset's
elements, in the same type as the input. If the process fails for
any reason (for example, no outlier was found), a ‘NULL’ pointer
will be returned.

All (possibly existing) blank elements are first removed from the
input dataset, then it is sorted. A sliding window of
‘window_size’ elements is parsed over the dataset. Starting from
the ‘window_size’-th element of the dataset, in the direction of
increasing values. This window is used as a reference. The first
element where the distance to the previous (sorted) element is
‘sigma’ units away from the distribution of distances in its window
is considered an outlier and returned by this function.

Formally, if we assume there are $N$ non-blank elements. They are
first sorted. Searching for the outlier starts on element $W$.
Let's take $v_i$ to be the $i$-th element of the sorted input (with
no blank values) and $m$ and $\sigma$ as the $\sigma$-clipped
median and standard deviation from the distances of the previous
$W$ elements (not including $v_i$). If the value given to ‘sigma’
is displayed with $s$, the $i$-th element is considered as an
outlier when the condition below is true.

$${(v_i-v_{i-1})-m\over \sigma}>s$$

The ‘sigclip_multip’ and ‘sigclip_param’ arguments specify the
properties of the $\sigma$-clipping (see *note Sigma clipping:: for
more). You see that by this definition, the outlier cannot be any
of the lower half elements. The advantage of this algorithm
compared to $\sigma$-clippign is that it only looks backwards (in
the sorted array) and parses it in one direction.

If ‘inplace!=0’, the removing of blank elements and sorting will be
done within the input dataset's allocated space. Otherwise, this
function will internally allocate (and later free) the necessary
space to keep the intermediate space that this process requires.

If ‘quiet!=0’, this function will report the parameters every time
it moves the window as a separate line with several columns. The
first column is the value, the second (in square brackets) is the
sorted index, the third is the distance of this element from the
previous one. The Fourth and fifth (in parenthesis) are the median
and standard deviation of the $\sigma$-clipped distribution within
the window and the last column is the difference between the third
and fourth, divided by the fifth.

-- Function:
gal_data_t *
gal_statistics_outlier_flat_cfp (gal_data_t *input, size_t
numprev, float sigclip_multip, float sigclip_param, float
thresh, size_t numcontig, int inplace, int quiet, size_t
*index)

Return the first element in the given dataset where the cumulative
frequency plot first becomes significantly flat for a sufficient
number of elements. The returned dataset only has one element
(with the same type as the input). If ‘index!=NULL’, the index
(counting from zero, after sorting the dataset and removing any
blanks) is written in the space that ‘index’ points to. If no
sufficiently flat portion is found, the returned pointer will be
‘NULL’.

The flatness on the cumulative frequency plot is defined like this
(see *note Histogram and Cumulative Frequency Plot::): on the
sorted dataset, for every point ($a_i$), we calculate
$d_i=a_{i+2}-a_{i-2}$. This done on the first $N$ elements (value
of ‘numprev’). After element $a_{N+2}$, we start estimating the
flatness as follows: for every element we use the $N$, $d_i$
measurements before it as the reference. Let's call this set $D_i$
for element $i$. The $\sigma$-clipped median ($m$) and standard
deviation ($s$) of $D_i$ are then calculated. The
$\sigma$-clipping can be configured with the two ‘sigclip_param’
and ‘sigclip_multip’ arguments.

Taking $t$ as the significance threshold (value to ‘thresh’), a
point is considered flat when $a_i>m+t\sigma$. But a single point
satisfying this condition will probably just be due to noise. To
make a more robust estimate, this significance/condition has to
hold for ‘numcontig’ contiguous elements after $a_i$. When this is
satisfied, $a_i$ is returned as the point where the distribution's
cumulative frequency plot becomes flat.

To get a good estimate of $m$ and $s$, it is thus recommended to
set ‘numprev’ as large as possible. However, be careful not to set
it too high: the checks in the paragraph above are not done on the
first ‘numprev’ elements and this function assumes the flatness
occurs after them. Also, be sure that the value to ‘numcontig’ is
much less than ‘numprev’, otherwise $\sigma$-clipping may not be
able to remove the immediate outliers in $D_i$ near the boundary of
the flat region.

When ‘quiet==0’, the basic measurements done on each element are
printed on the command-line (good for finding the best parameters).
When ‘inplace!=0’, the sorting and removal of blank elements is
done on the input dataset, so the input may be altered after this
function.

File: gnuastro.info, Node: Fitting functions, Next: Binary datasets, Prev: Statistical operations, Up: Gnuastro library

12.3.23 Fitting functions (‘fit.h’)
-----------------------------------

After doing a measurement, it is usually necessary to parameterize the
relation that has been found. The functions in this section are
wrappers over the GNU Scientific Library (GSL) Linear Least-Squares
Fitting (https://www.gnu.org/software/gsl/doc/html/lls.html), to make
them easily accessible using Gnuastro's *note Generic data container::.
The respective GSL function is mentioned under each function.

-- Global integer: GAL_FIT_INVALID
-- Global integer: GAL_FIT_LINEAR
-- Global integer: GAL_FIT_LINEAR_WEIGHTED
-- Global integer: GAL_FIT_LINEAR_NO_CONSTANT
-- Global integer: GAL_FIT_LINEAR_NO_CONSTANT_WEIGHTED
-- Global integer: GAL_FIT_POLYNOMIAL
-- Global integer: GAL_FIT_POLYNOMIAL_WEIGHTED
-- Global integer: GAL_FIT_POLYNOMIAL_NUMBER
Identifiers for the various types of fitting functions. These can
be used by the callers of these functions to select between various
fitting types. They can easily be converted to, and from, fixed
human-readable strings using the ‘gal_fit_name_*’ functions below.
The last one ‘GAL_FIT_ROBUST_NUMBER’ is the total number of
available fitting methods (can be used to add more macros in the
calling program and to avoid overlaps with existing codes).

-- Global integer: GAL_FIT_ROBUST_INVALID
-- Global integer: GAL_FIT_ROBUST_DEFAULT
-- Global integer: GAL_FIT_ROBUST_BISQUARE
-- Global integer: GAL_FIT_ROBUST_CAUCHY
-- Global integer: GAL_FIT_ROBUST_FAIR
-- Global integer: GAL_FIT_ROBUST_HUBER
-- Global integer: GAL_FIT_ROBUST_OLS
-- Global integer: GAL_FIT_ROBUST_WELSCH
-- Global integer: GAL_FIT_ROBUST_NUMBER
Identifiers for the various types of robust polynomial fitting
functions. For a description of each, see
<https://www.gnu.org/s/gsl/doc/html/lls.html#c.gsl_multifit_robust_alloc>.
The last one ‘GAL_FIT_ROBUST_NUMBER’ is the total number of
available functions (can be used to add more macros in the calling
program and to avoid overlaps with existing codes).

-- Function:
uint8_t
gal_fit_name_to_id (char *name)
Return the internal code of a standard human-readable name for the
various fitting functions. If the name is not recognized, the
returned value will be ‘GAL_FIT_INVALID’.

-- Function:
char *
gal_fit_name_from_id (uint8_t fitid)
Return a standard human-readable name for the fitting function
identified with the ‘fitid’ (read as "fitting ID"). If the fitting
ID couldn't be recognized, a NULL pointer is returned.

-- Function:
uint8_t
gal_fit_name_robust_to_id (char *name)
Return the internal code of a standard human-readable name for the
various robust fitting types. If the name is not recognized, the
returned value will be ‘GAL_FIT_INVALID’.

-- Function:
char *
gal_fit_name_robust_from_id (uint8_t robustid)
Return a standard human-readable name for the input robust fitting
type. If the fitting ID couldn't be recognized, a NULL pointer is
returned.

-- Function:
gal_data_t *
gal_fit_1d_linear (gal_data_t *xin, gal_data_t *yin,
gal_data_t *ywht)
Preform a 1D linear regression fit with a constant term(1) in the
form of $Y=c_0+c_1X$. The input ‘xin’ contains the independent
variable values and ‘yin’ contains the measured variable values for
each independent variable. When ‘ywht!=NULL’, it is assumed to
contain the "weight" of each Y measurement (if you don't have
weights on your measured values, simply set this to ‘NULL’). The
weight of each measurement is the inverse of its variance. For a
Gaussian error distribution with standard deviation $\sigma$, the
weight is therefore $1/\sigma^2$.

If any of the values in any of the inputs is blank (NaN in floating
point), the final fitted parameters will all be NaN. To remove rows
with a NaN/blank, you can use ‘gal_blank_remove_rows’ (which will
remove all rows with a blank values in any of the columns with a
single call).

The output is a single dataset with a ‘GAL_TYPE_FLOAT64’ type with
6 elements:
1. $c_0$: the constant in $Y=c_0+c_1X$.
2. $c_1$: the multiple in $Y=c_0+c_1X$.
3. First element of variance-covariance matrix.
4. Second and third (which are equal) elements of the
variance-covariance matrix.
5. Fourth element of the variance-covariance matrix.
6. The reduced $\chi^2$ of the fit.

-- Function:
gal_data_t *
gal_fit_1d_linear_no_constant (gal_data_t *xin, gal_data_t
*yin, gal_data_t *ywht)
Preform a 1D linear regression fit _without_ a constant term(2),
formally: $Y=c_1X$. The input ‘xin’ contains the independent
variable values and ‘yin’ contains the measured variable values for
each independent variable. When ‘ywht!=NULL’, it is assumed to
contain the "weight" of each Y measurement (if you don't have
weights on your measured values, simply set this to ‘NULL’). The
weight of each measurement is the inverse of its variance. For a
Gaussian error distribution with standard deviation $\sigma$, the
weight is therefore $1/\sigma^2$.

The output is a single dataset with a ‘GAL_TYPE_FLOAT64’ type with
3 elements:
1. $c_1$: the multiple in $Y=c_0+c_1X$.
2. Variance of $c_1$.
3. The reduced $\chi^2$ of the fit.

-- Function:
gal_data_t *
gal_fit_1d_linear_estimate (gal_data_t *fit, gal_data_t *xin)
Given a linear least squares fit output (‘fit’), estimate the fit
on an arbitrary number of independent variable (horizontal axis, or
X, in an X-Y plot) within ‘xin’. ‘fit’ is assumed to be the output
of either ‘gal_fit_1d_linear’ or ‘gal_fit_1d_linear_no_constant’.
In case you haven't used those functions to obtain the constants
and covariance matrix elements, see the description of those
functions for the expected format of ‘fit’.

This function returns two columns (as a *note List of
gal_data_t::): The top node of the list is the estimated values at
the input X-axis positions, and the next node is the errors in the
estimation. Naturally, both have the same number of elements as
‘xin’. Being a list, helps in easily printing the output columns
to a table (see *note Table input output::).

-- Function:
gal_data_t *
gal_fit_1d_polynomial (gal_data_t *xin, gal_data_t *yin,
gal_data_t *ywht, size_t maxpower, double *redchisq)
Preform a 1D polynomial fit, formally:
$Y=c+0+c_1X+c_2X^2+\cdots+c_nX^n$ (using GSL's multi-parameter
regression(3)). The largest power of $X$ is determined with the
‘maxpower’ argument (which is $n$ in the equation above). The
reduced $\chi^2$ of the fit is written in the space that
‘*redchisq’ points to.

The input ‘xin’ contains the independent variable values and the
input ‘yin’ contains the measured variable values for each
independent variable. When ‘ywht!=NULL’, it is assumed to contain
the "weight" of each Y measurement (if you don't have weights on
your measured values, simply set this to ‘NULL’). The weight of
each measurement is the inverse of its variance. For a Gaussian
error distribution with standard deviation $\sigma$, the weight is
therefore $1/\sigma^2$.

The output of this function is a list of two datasets, linked as a
list (as a *note List of gal_data_t::). Both have a
‘GAL_TYPE_FLOAT64’ type, and are described below (in order).
1. A one dimensional and contains $n+1$ elements (for the $n+1$
constants that have been found $(c_0, c_1, c_2, \cdots, c_n)$.
2. A two dimensional variance-covariance matrix with
$(n+1)\times(n+1)$ elements.

-- Function:
gal_data_t *
gal_fit_1d_polynomial_robust (gal_data_t *xin, gal_data_t
*yin, size_t maxpower, uint8_t robustid, double *redchisq)
Preform a 1D robust polynomial fit, formally:
$Y=c+0+c_1X+c_2X^2+\cdots+c_nX^n$ (using GSL's robust linear
regression(4)). See the description there for the details.

The inputs and outputs of this function are almost identical to
‘gal_fit_1d_polynomial’, with the difference that you need to
specify the function to reject outliers through the ‘robustid’
input argument. You can pass any of the ‘GAL_FIT_ROBUST_*’ codes
defined at the top of this section to this (the names are identical
to the names in GSL).

-- Function:
gal_data_t *
gal_fit_1d_polynomial_estimate (gal_data_t *fit, gal_data_t
*xin)
Given a 1D polynomial fit output (‘fit’), estimate the fit on an
arbitrary number of independent variable (horizontal axis, or X, in
an X-Y plot) within ‘xin’. ‘fit’ is assumed to be the output of
‘gal_fit_1d_polynomial’. In case you haven't used this function to
obtain the constants and covariance matrix, see the description of
that function for the expected format of ‘fit’.

---------- Footnotes ----------

(1)
<https://www.gnu.org/s/gsl/doc/html/lls.html#linear-regression-with-a-constant-term>

(2)
<https://www.gnu.org/s/gsl/doc/html/lls.html#linear-regression-without-a-constant-term>

(3)
<https://www.gnu.org/s/gsl/doc/html/lls.html#multi-parameter-regression>

(4)
<https://www.gnu.org/software/gsl/doc/html/lls.html#robust-linear-regression>

File: gnuastro.info, Node: Binary datasets, Next: Labeled datasets, Prev: Fitting functions, Up: Gnuastro library

12.3.24 Binary datasets (‘binary.h’)
------------------------------------

Binary datasets only have two (usable) values: 0 (also known as
background) or 1 (also known as foreground). They are created after
some binary classification is applied to the dataset. The most common
is thresholding: for example, in an image, pixels with a value above the
threshold are given a value of 1 and those with a value less than the
threshold are assigned a value of 0.

Since there is only two values, in the processing of binary images,
you are usually concerned with the positioning of an element and its
vicinity (neighbors). When a dataset has more than one dimension,
multiple classes of immediate neighbors (that are touching the element)
can be defined for each data-element. To separate these different
classes of immediate neighbors, we define _connectivity_.

The classification is done by the distance from element center to the
neighbor's center. The nearest immediate neighbors have a connectivity
of 1, the second nearest class of neighbors have a connectivity of 2 and
so on. In total, the largest possible connectivity for data with ‘ndim’
dimensions is ‘ndim’. For example, in a 2D dataset, 4-connected
neighbors (that share an edge and have a distance of 1 pixel) have a
connectivity of 1. The other 4 neighbors that only share a vertice
(with a distance of $\sqrt{2}$ pixels) have a connectivity of 2.
Conventionally, the class of connectivity-2 neighbors also includes the
connectivity 1 neighbors, so for example, we call them 8-connected
neighbors in 2D datasets.

Ideally, one bit is sufficient for each element of a binary dataset.
However, CPUs are not designed to work on individual bits, the smallest
unit of memory addresses is a byte (containing 8 bits on modern CPUs).
Therefore, in Gnuastro, the type used for binary dataset is ‘uint8_t’
(see *note Numeric data types::). Although it does take 8-times more
memory, this choice offers much better performance and the some extra
(useful) features.

The advantage of using a full byte for each element of a binary
dataset is that you can also have other values (that will be ignored in
the processing). One such common "other" value in real datasets is a
blank value (to mark regions that should not be processed because there
is no data). The constant ‘GAL_BLANK_UINT8’ value must be used in these
cases (see *note Library blank values::). Another is some temporary
value(s) that can be given to a processed pixel to avoid having another
copy of the dataset as in ‘GAL_BINARY_TMP_VALUE’ that is described
below.

-- Macro: GAL_BINARY_TMP_VALUE
The functions described below work on a ‘uint8_t’ type dataset with
values of 1 or 0 (no other pixel will be touched). However, in
some cases, it is necessary to put temporary values in each element
during the processing of the functions. This temporary value has a
special meaning for the operation and will be operated on. So if
your input datasets have values other than 0 and 1 that you do not
want these functions to work on, be sure they are not equal to this
macro's value. Note that this value is also different from
‘GAL_BLANK_UINT8’, so your input datasets may also contain blank
elements.

-- Function:
gal_data_t *
gal_binary_erode (gal_data_t *input, size_t num, int
connectivity, int inplace)
Do ‘num’ erosions on the ‘connectivity’-connected neighbors of
‘input’ (see above for the definition of connectivity).

If ‘inplace’ is non-zero _and_ the input's type is
‘GAL_TYPE_UINT8’, then the erosion will be done within the input
dataset and the returned pointer will be ‘input’. Otherwise,
‘input’ is copied (and converted if necessary) to ‘GAL_TYPE_UINT8’
and erosion will be done on this new dataset which will also be
returned. This function will only work on the elements with a
value of 1 or 0. It will leave all the rest unchanged.

Erosion (inverse of dilation) is an operation in mathematical
morphology where each foreground pixel that is touching a
background pixel is flipped (changed to background). The
‘connectivity’ value determines the definition of "touching".
Erosion will thus decrease the area of the foreground regions by
one layer of pixels.

-- Function:
gal_data_t *
gal_binary_dilate (gal_data_t *input, size_t num, int
connectivity, int inplace)
Do ‘num’ dilations on the ‘connectivity’-connected neighbors of
‘input’ (see above for the definition of connectivity). For more
on ‘inplace’ and the output, see ‘gal_binary_erode’.

Dilation (inverse of erosion) is an operation in mathematical
morphology where each background pixel that is touching a
foreground pixel is flipped (changed to foreground). The
‘connectivity’ value determines the definition of "touching".
Dilation will thus increase the area of the foreground regions by
one layer of pixels.

-- Function:
gal_data_t *
gal_binary_open (gal_data_t *input, size_t num, int
connectivity, int inplace)
Do ‘num’ openings on the ‘connectivity’-connected neighbors of
‘input’ (see above for the definition of connectivity). For more
on ‘inplace’ and the output, see ‘gal_binary_erode’.

Opening is an operation in mathematical morphology which is defined
as erosion followed by dilation (see above for the definitions of
erosion and dilation). Opening will thus remove the outer
structure of the foreground. In this implementation, ‘num’
erosions are going to be applied on the dataset, then ‘num’
dilations.

-- Function:
gal_data_t *
gal_binary_number_neighbors (gal_data_t *input, int
connectivity, int inplace)
Return an image of the same size as the input, but where each
non-zero and non-blank input pixel is replaced with the number of
its non-zero and non-blank neighbors. The input dataset is assumed
to be binary (having an unsigned, 8-bit dataset). The neighbors
are defined through the ‘connectivity’ argument (see above) and if
‘inplace!=0’, then the output will be written into the input.

-- Function:
size_t
gal_binary_connected_components (gal_data_t *binary,
gal_data_t **out, int connectivity)
Return the number of connected components in ‘binary’ through the
breadth first search algorithm (finding all pixels belonging to one
component before going on to the next). Connection between two
pixels is defined based on the value to ‘connectivity’. ‘out’ is a
dataset with the same size as ‘binary’ with ‘GAL_TYPE_INT32’ type.
Every pixel in ‘out’ will have the label of the connected component
it belongs to. The labeling of connected components starts from 1,
so a label of zero is given to the input's background pixels.

When ‘*out!=NULL’ (its space is already allocated), it will be
cleared (to zero) at the start of this function. Otherwise, when
‘*out==NULL’, the necessary dataset to keep the output will be
allocated by this function.

‘binary’ must have a type of ‘GAL_TYPE_UINT8’, otherwise this
function will abort with an error. Other than blank pixels (with a
value of ‘GAL_BLANK_UINT8’ defined in *note Library blank
values::), all other non-zero pixels in ‘binary’ will be considered
as foreground (and will be labeled). Blank pixels in the input
will also be blank in the output.

-- Function:
gal_data_t *
gal_binary_connected_indexs(gal_data_t *binary, int
connectivity)
Build a ‘gal_data_t’ linked list, where each node of the list
contains an array with indices of the connected regions. Therefore
the arrays of each node can have a different size. Note that the
indices will only be calculated on the pixels with a value of 1 and
internally, it will temporarily change the values to 2 (and return
them back to 1 in the end).

-- Function:
gal_data_t *
gal_binary_connected_adjacency_matrix (gal_data_t *adjacency,
size_t *numconnected)
Find the number of connected labels and new labels based on an
adjacency matrix, which must be a square binary array (type
‘GAL_TYPE_UINT8’). The returned dataset is a list of new labels
for each old label. In other words, this function will find the
objects that are connected (possibly through a third object) and in
the output array, the respective elements for all input labels is
going to have the same value. The total number of connected labels
is put into the space that ‘numconnected’ points to.

An adjacency matrix defines connection between two labels. For
example, let's assume we have 5 labels and we know that labels 1
and 5 are connected to label 3, but are not connected with each
other. Also, labels 2 and 4 are not touching any other label. So
in total we have 3 final labels: one combined object (merged from
labels 1, 3, and 5) and the initial labels 2 and 4. The input
adjacency matrix would look like this (note the extra row and
column for a label 0 which is ignored):

INPUT OUTPUT
===== ======
in_lab 1 2 3 4 5 |
| numconnected = 3
0 0 0 0 0 0 |
in_lab 1 --> 0 0 0 1 0 0 |
in_lab 2 --> 0 0 0 0 0 0 | Returned: new labels for the
in_lab 3 --> 0 1 0 0 0 1 | 5 initial objects
in_lab 4 --> 0 0 0 0 0 0 | | 0 | 1 | 2 | 1 | 3 | 1 |
in_lab 5 --> 0 0 0 1 0 0 |

Although the adjacency matrix as used here is symmetric, currently
this function assumes that it is filled on both sides of the
diagonal.

-- Function:
gal_data_t *
gal_binary_connected_adjacency_list (gal_list_sizet_t
**listarr, size_t number, size_t minmapsize, int quietmmap,
size_t *numconnected)
Find the number of connected labels and new labels based on an
adjacency list. The output of this function is identical to that
of ‘gal_binary_connected_adjacency_matrix’. But the major
difference is that it uses a list of connected labels to each label
instead of a square adjacency matrix. This is done because when
the number of labels becomes very large (for example, on the scale
of 100,000), the adjacency matrix can consume more than 10GB of
RAM!

The input list has the following format: it is an array of pointers
to ‘gal_list_sizet_t *’ (or ‘gal_list_sizet_t **’). The array has
‘number’ elements and each ‘listarr[i]’ is a linked list of
‘gal_list_sizet_t *’. As a demonstration, the input of the same
example in ‘gal_binary_connected_adjacency_matrix’ would look like
below and the output of this function will be identical to there.

listarr[0] = NULL
listarr[1] = 3
listarr[2] = NULL
listarr[3] = 1 -> 5
listarr[4] = NULL
listarr[5] = 3

From this example, it is already clear that this method will
consume far less memory. But because it needs to parse lists (and
not easily jump between array elements), it can be slower. But in
scenarios where there are too many objects (that may exceed the
whole system's RAM+SWAP), this option is a good alternative and the
drop in processing speed is worth getting the job done.

Similar to ‘gal_binary_connected_adjacency_matrix’, this function
will write the final number of connected labels in ‘numconnected’.
But since it takes no ‘gal_data_t *’ argument (where it can inherit
the ‘minmapsize’ and ‘quietmmap’ parameters), it also needs these
as input. For more on ‘minmapsize’ and ‘quietmmap’, see *note
Memory management::.

-- Function:
gal_data_t *
gal_binary_holes_label (gal_data_t *input, int connectivity,
size_t *numholes)
Label all the holes in the foreground (non-zero elements in input)
as independent regions. Holes are background regions (zero-valued
in input) that are fully surrounded by the foreground, as defined
by ‘connectivity’. The returned dataset has a 32-bit signed
integer type with the size of the input. All holes in the input
will have labels/counters greater or equal to ‘1’. The rest of the
background regions will still have a value of ‘0’ and the initial
foreground pixels will have a value of ‘-1’. The total number of
holes will be written where ‘numholes’ points to.

-- Function:
void
gal_binary_holes_fill (gal_data_t *input, int connectivity,
size_t maxsize)
Fill all the holes (0 valued pixels surrounded by 1 valued pixels)
of the binary ‘input’ dataset. The connectivity of the holes can
be set with ‘connectivity’. Holes larger than ‘maxsize’ are not
filled. This function currently only works on a 2D dataset.

File: gnuastro.info, Node: Labeled datasets, Next: Convolution functions, Prev: Binary datasets, Up: Gnuastro library

12.3.25 Labeled datasets (‘label.h’)
------------------------------------

A labeled dataset is one where each element/pixel has an integer label
(or counter). The label identifies the group/class that the element
belongs to. This form of labeling allows the higher-level study of all
pixels within a certain class.

For example, to detect objects/targets in an image/dataset, you can
apply a threshold to separate the noise from the signal (to detect
diffuse signal, a threshold is useless and more advanced methods are
necessary, for example *note NoiseChisel::). But the output of
detection is a binary dataset (which is just a very low-level labeling
of ‘0’ for noise and ‘1’ for signal).

The raw detection map is therefore hardly useful for any kind of
analysis on objects/targets in the image. One solution is to use a
connected-components algorithm (see ‘gal_binary_connected_components’ in
*note Binary datasets::). It is a simple and useful way to
separate/label connected patches in the foreground. This higher-level
(but still elementary) labeling therefore allows you to count how many
connected patches of signal there are in the dataset and is a major
improvement compared to the raw detection.

However, when your objects/targets are touching, the simple connected
components algorithm is not enough and a still higher-level labeling
mechanism is necessary. This brings us to the necessity of the
functions in this part of Gnuastro's library. The main inputs to the
functions in this section are already labeled datasets (for example,
with the connected components algorithm above).

Each of the labeled regions are independent of each other (the labels
specify different classes of targets). Therefore, especially in large
datasets, it is often useful to process each label on independent CPU
threads in parallel rather than in series. Therefore the functions of
this section actually use an array of pixel/element indices (belonging
to each label/class) as the main identifier of a region. Using indices
will also allow processing of overlapping labels (for example, in
deblending problems). Just note that overlapping labels are not yet
implemented, but planned. You can use ‘gal_label_indexs’ to generate
lists of indices belonging to separate classes from the labeled input.

-- Macro: GAL_LABEL_INIT
-- Macro: GAL_LABEL_RIVER
-- Macro: GAL_LABEL_TMPCHECK
Special negative integer values used internally by some of the
functions in this section. Recall that meaningful labels are
considered to be positive integers ($\geq1$). Zero is
conventionally kept for regions with no labels, therefore negative
integers can be used for any extra classification in the labeled
datasets.

-- Function:
gal_data_t *
gal_label_indexs (gal_data_t *labels, size_t numlabs, size_t
minmapsize, int quietmmap)

Return an array of ‘gal_data_t’ containers, each containing the
pixel indices of the respective label (see *note Generic data
container::). ‘labels’ contains the label of each element and has
to have an ‘GAL_TYPE_INT32’ type (see *note Library data types::).
Only positive (greater than zero) values in ‘labels’ will be
used/indexed, other elements will be ignored.

Meaningful labels start from ‘1’ and not ‘0’, therefore the output
array of ‘gal_data_t’ will contain ‘numlabs+1’ elements. The first
(zero-th) element of the output (‘indexs[0]’ in the example below)
will be initialized to a dataset with zero elements. This will
allow easy (non-confusing) access to the indices of each
(meaningful) label.

‘numlabs’ is the number of labels in the dataset. If it is given a
value of zero, then the maximum value in the input (largest label)
will be found and used. Therefore if it is given, but smaller than
the actual number of labels, this function may/will crash (it will
write in un-allocated space). ‘numlabs’ is therefore useful in a
highly optimized/checked environment.

For example, if the returned array is called ‘indexs’, then
‘indexs[10].size’ contains the number of elements that have a label
of ‘10’ in ‘labels’ and ‘indexs[10].array’ is an array (after
casting to ‘size_t *’) containing the indices of each one of those
elements/pixels.

By _index_ we mean the 1D position: the input number of dimensions
is irrelevant (any dimensionality is supported). In other words,
each element's index is the number of elements/pixels between it
and the dataset's first element/pixel. Therefore it is always
greater or equal to zero and stored in ‘size_t’ type.

-- Function:
size_t
gal_label_watershed (gal_data_t *values, gal_data_t *indexs,
gal_data_t *label, size_t *topinds, int min0_max1)
Use the watershed algorithm(1) to "over-segment" the pixels in the
‘indexs’ dataset based on values in the ‘values’ dataset.
Internally, each local extrema (maximum or minimum, based on
‘min0_max1’) and its surrounding pixels will be given a unique
label. For demonstration, see Figures 8 and 9 of Akhlaghi and
Ichikawa 2015 (http://arxiv.org/abs/1505.01664). If
‘topinds!=NULL’, it is assumed to point to an already allocated
space to write the index of each clump's local extrema, otherwise,
it is ignored.

The ‘values’ dataset must have a 32-bit floating point type
(‘GAL_TYPE_FLOAT32’, see *note Library data types::) and will only
be read by this function. ‘indexs’ must contain the indices of the
elements/pixels that will be over-segmented by this function and
have a ‘GAL_TYPE_SIZE_T’ type, see the description of
‘gal_label_indexs’, above. The final labels will be written in the
respective positions of ‘labels’, which must have a
‘GAL_TYPE_INT32’ type and be the same size as ‘values’.

When ‘indexs’ is already sorted, this function will ignore
‘min0_max1’. To judge if the dataset is sorted or not (by the
values the indices correspond to in ‘values’, not the actual
indices), this function will look into the bits of ‘indexs->flag’,
for the respective bit flags, see *note Generic data container::.
If ‘indexs’ is not already sorted, this function will sort it
according to the values of the respective pixel in ‘values’. The
increasing/decreasing order will be determined by ‘min0_max1’.
Note that if this function is called on multiple threads _and_
‘values’ points to a different array on each thread, this function
will not return a reasonable result. In this case, please sort
‘indexs’ prior to calling this function (see
‘gal_qsort_index_multi_d’ in *note Qsort functions::).

When ‘indexs’ is decreasing (increasing), or ‘min0_max1’ is ‘1’
(‘0’), local minima (maxima), are considered rivers (watersheds)
and given a label of ‘GAL_LABEL_RIVER’ (see above).

Note that rivers/watersheds will also be formed on the edges of the
labeled regions or when the labeled pixels touch a blank pixel.
Therefore this function will need to check for the presence of
blank values. To be most efficient, it is thus recommended to use
‘gal_blank_present’ (with ‘updateflag=1’) prior to calling this
function (see *note Library blank values::. Once the flag has been
set, no other function (including this one) that needs special
behavior for blank pixels will have to parse the dataset to see if
it has blank values any more.

If you are sure your dataset does not have blank values (by the
design of your software), to avoid an extra parsing of the dataset
and improve performance, you can set the two bits manually (see the
description of ‘flags’ in *note Generic data container::):
input->flag |= GAL_DATA_FLAG_BLANK_CH; /* Set bit to 1. */
input->flag &= ~GAL_DATA_FLAG_HASBLANK; /* Set bit to 0. */

-- Function:
void
gal_label_clump_significance (gal_data_t *values, gal_data_t
*std, gal_data_t *label, gal_data_t *indexs, struct
gal_tile_two_layer_params *tl, size_t numclumps, size_t
minarea, int variance, int keepsmall, gal_data_t *sig,
gal_data_t *sigind)
This function is usually called after ‘gal_label_watershed’, and is
used as a measure to identify which over-segmented "clumps" are
real and which are noise.

A measurement is done on each clump (using the ‘values’ and ‘std’
datasets, see below). To help in multi-threaded environments, the
operation is only done on pixels which are indexed in ‘indexs’. It
is expected for ‘indexs’ to be sorted by their values in ‘values’.
If not sorted, the measurement may not be reliable. If sorted in a
decreasing order, then clump building will start from their highest
value and vice-versa. See the description of ‘gal_label_watershed’
for more on ‘indexs’.

Each "clump" (identified by a positive integer) is assumed to be
surrounded by at least one river/watershed pixel (with a
non-positive label). This function will parse the pixels
identified in ‘indexs’ and make a measurement on each clump and
over all the river/watershed pixels. The number of clumps
(‘numclumps’) must be given as an input argument and any clump that
is smaller than ‘minarea’ is ignored (because of scatter). If
‘variance’ is non-zero, then the ‘std’ dataset is interpreted as
variance, not standard deviation.

The ‘values’ and ‘std’ datasets must have a ‘float’ (32-bit
floating point) type. Also, ‘label’ and ‘indexs’ must respectively
have ‘int32’ and ‘size_t’ types. ‘values’ and ‘label’ must have
the same size, but ‘std’ can have three possible sizes: 1) a single
element (which will be used for the whole dataset, 2) the same size
as ‘values’ (so a different error can be assigned to every pixel),
3) a single value for each tile, based on the ‘tl’ tessellation
(see *note Tile grid::). In the last case, a tile/value will be
associated to each clump based on its flux-weighted (only positive
values) center.

The main output is an internally allocated, 1-dimensional array
with one value per label. The array information (length, type,
etc.) will be written into the ‘sig’ generic data container.
Therefore ‘sig->array’ must be ‘NULL’ when this function is called.
After this function, the details of the array (number of elements,
type and size, etc) will be written in to the various components of
‘sig’, see the definition of ‘gal_data_t’ in *note Generic data
container::. Therefore ‘sig’ must already be allocated before
calling this function.

Optionally (when ‘sigind!=NULL’, similar to ‘sig’) the clump labels
of each measurement in ‘sig’ will be written in ‘sigind->array’.
If ‘keepsmall’ zero, small clumps (where no measurement is made)
will not be included in the output table.

This function is initially intended for a multi-threaded
environment. In such cases, you will be writing arrays of clump
measures from different regions in parallel into an array of
‘gal_data_t’s. You can simply allocate (and initialize), such an
array with the ‘gal_data_array_calloc’ function in *note Arrays of
datasets::. For example, if the ‘gal_data_t’ array is called
‘array’, you can pass ‘&array[i]’ as ‘sig’.

Along with some other functions in ‘label.h’, this function was
initially written for *note Segment::. The description of the
parameter used to measure a clump's significance is fully given in
Akhlaghi 2019 (https://arxiv.org/abs/1909.11230).

-- Function:
void
gal_label_grow_indexs (gal_data_t *labels, gal_data_t *indexs,
int withrivers, int connectivity)
Grow the (positive) labels of ‘labels’ over the pixels in ‘indexs’
(see description of ‘gal_label_indexs’). The pixels (position in
‘indexs’, values in ‘labels’) that must be "grown" must have a
value of ‘GAL_LABEL_INIT’ in ‘labels’ before calling this function.
For a demonstration see Columns 2 and 3 of Figure 10 in Akhlaghi
and Ichikawa 2015 (http://arxiv.org/abs/1505.01664).

In many aspects, this function is very similar to over-segmentation
(watershed algorithm, ‘gal_label_watershed’). The big difference
is that in over-segmentation local maximums (that are not touching
any already labeled pixel) get a separate label. However, here the
final number of labels will not change. All pixels that are not
directly touching a labeled pixel just get pushed back to the start
of the loop, and the loop iterates until its size does not change
any more. This is because in a generic scenario some of the
indexed pixels might not be reachable through other indexed pixels.

The next major difference with over-segmentation is that when there
is only one label in growth region(s), it is not mandatory for
‘indexs’ to be sorted by values. If there are multiple labeled
regions in growth region(s), then values are important and you can
use ‘qsort’ with ‘gal_qsort_index_single_d’ to sort the indices by
values in a separate array (see *note Qsort functions::).

This function looks for positive-valued neighbors of each pixel in
‘indexs’ and will label a pixel if it touches one. Therefore, it
is very important that only pixels/labels that are intended for
growth have positive values in ‘labels’ before calling this
function. Any non-positive (zero or negative) value will be
ignored as a label by this function. Thus, it is recommended that
while filling in the ‘indexs’ array values, you initialize all the
pixels that are in ‘indexs’ with ‘GAL_LABEL_INIT’, and set
non-labeled pixels that you do not want to grow to ‘0’.

This function will write into both the input datasets. After this
function, some of the non-positive ‘labels’ pixels will have a new
positivelabel and the number of useful elements in ‘indexs’ will
have decreased. The index of those pixels that could not be
labeled will remain inside ‘indexs’. If ‘withrivers’ is non-zero,
then pixels that are immediately touching more than one positive
value will be given a ‘GAL_LABEL_RIVER’ label.

Note that the ‘indexs->array’ is not re-allocated to its new size
at the end(2). But since ‘indexs->dsize[0]’ and ‘indexs->size’
have new values after this function is returned, the extra elements
just will not be used until they are ultimately freed by
‘gal_data_free’.

Connectivity is a value between ‘1’ (fewest number of neighbors)
and the number of dimensions in the input (most number of
neighbors). For example, in a 2D dataset, a connectivity of ‘1’
and ‘2’ corresponds to 4-connected and 8-connected neighbors.

---------- Footnotes ----------

(1) The watershed algorithm was initially introduced by Vincent and
Soille (https://doi.org/10.1109/34.87344). It starts from the minima
and puts the pixels in, one by one, to grow them until the touch (create
a watershed). For more, also see the Wikipedia article:
<https://en.wikipedia.org/wiki/Watershed_%28image_processing%29>.

(2) Note that according to the GNU C Library, even a ‘realloc’ to a
smaller size can also cause a re-write of the whole array, which is not
a cheap operation.

File: gnuastro.info, Node: Convolution functions, Next: Pooling functions, Prev: Labeled datasets, Up: Gnuastro library

12.3.26 Convolution functions (‘convolve.h’)
--------------------------------------------

Convolution is a very common operation during data analysis and is
thoroughly described as part of Gnuastro's *note Convolve:: program
which is fully devoted to this job. Because of the complete
introduction that was presented there, we will directly skip onto the
currently available convolution functions in Gnuastro's library.

As of this version, only spatial domain convolution is available in
Gnuastro's libraries. We have not had the time to liberate the
frequency domain function convolution and deconvolution functions that
are available in the Convolve program(1).

-- Function:
gal_data_t *
gal_convolve_spatial (gal_data_t *tiles, gal_data_t *kernel,
size_t numthreads, int edgecorrection, int convoverch, int
conv_on_blank)
Convolve the given ‘tiles’ dataset (possibly a list of tiles, see
*note List of gal_data_t:: and *note Tessellation library::) with
‘kernel’ on ‘numthreads’ threads. When ‘edgecorrection’ is
non-zero, it will correct for the edge dimming effects as discussed
in *note Edges in the spatial domain::. When ‘conv_on_blank’ is
non-zero, this function will also attempt convolution over the
blank pixels (and therefore give values to the blank pixels that
are near non-blank pixels).

‘tiles’ can be a single/complete dataset, but in that case the
speed will be very slow. Therefore, for larger images, it is
recommended to give a list of tiles covering a dataset. To create
a tessellation that fully covers an input image, you may use
‘gal_tile_full’, or ‘gal_tile_full_two_layers’ to also define
channels over your input dataset. These functions are discussed in
*note Tile grid::. You may then pass the list of tiles to this
function. This is the recommended way to call this function
because spatial domain convolution is slow and breaking the job
into many small tiles and working on simultaneously on several
threads can greatly speed up the processing.

If the tiles are defined within a channel (a larger tile), by
default convolution will be done within the channel, so pixels on
the edge of a channel will not be affected by their neighbors that
are in another channel. See *note Tessellation:: for the necessity
of channels in astronomical data analysis. This behavior may be
disabled when ‘convoverch’ is non-zero. In this case, it will
ignore channel borders (if they exist) and mix all pixels that
cover the kernel within the dataset.

-- Function:
void
gal_convolve_spatial_correct_ch_edge (gal_data_t *tiles,
gal_data_t *kernel, size_t numthreads, int edgecorrection, int
conv_on_blank, gal_data_t *tocorrect)
Correct the edges of channels in an already convolved image when it
was initially convolved with ‘gal_convolve_spatial’ and
‘convoverch==0’. In that case, strong boundaries might exist on
the channel edges. So if you later need to remove those boundaries
at later steps of your processing, you can call this function. It
will only do convolution on the tiles that are near the edge and
were effected by the channel borders. Other pixels in the image
will not be touched. Hence, it is much faster. When
‘conv_on_blank’ is non-zero, this function will also attempt
convolution over the blank pixels (and therefore give values to the
blank pixels that are near non-blank pixels).

---------- Footnotes ----------

(1) Hence any help would be greatly appreciated.

File: gnuastro.info, Node: Pooling functions, Next: Interpolation, Prev: Convolution functions, Up: Gnuastro library

12.3.27 Pooling functions (‘pool.h’)
------------------------------------

Pooling is the process of reducing the complexity of the input image
(its size and variation of pixel values). Its underlying concepts, and
an analysis of its usefulness, is fully described in *note Pooling
operators::. The following functions are available pooling in Gnuastro.
Just note that unlike the Arithmetic operators, the output of these
functions should contain a correct WCS in their output.

-- Function:
gal_data_t *
gal_pool_max (gal_data_t *input, size_t psize, size_t
numthreads)
Return the max-pool of ‘input’, assuming a pool size of ‘psize’
pixels. The number of threads to use can be set with ‘numthreads’.

-- Function:
gal_data_t *
gal_pool_min (gal_data_t *input, size_t psize, size_t
numthreads)
Return the min-pool of ‘input’, assuming a pool size of ‘psize’
pixels. The number of threads to use can be set with ‘numthreads’.

-- Function:
gal_data_t *
gal_pool_sum (gal_data_t *input, size_t psize, size_t
numthreads)
Return the sum-pool of ‘input’, assuming a pool size of ‘psize’
pixels. The number of threads to use can be set with ‘numthreads’.

-- Function:
gal_data_t *
gal_pool_mean (gal_data_t *input, size_t psize, size_t
numthreads)
Return the mean-pool of ‘input’, assuming a pool size of ‘psize’
pixels. The number of threads to use can be set with ‘numthreads’.

-- Function:
gal_data_t *
gal_pool_median (gal_data_t *input, size_t psize, size_t
numthreads)
Return the median-pool of ‘input’, assuming a pool size of ‘psize’
pixels. The number of threads to use can be set with ‘numthreads’.

File: gnuastro.info, Node: Interpolation, Next: Warp library, Prev: Pooling functions, Up: Gnuastro library

12.3.28 Interpolation (‘interpolate.h’)
---------------------------------------

During data analysis, it happens that parts of the data cannot be given
a value, but one is necessary for the higher-level analysis. For
example, a very bright star saturated part of your image and you need to
fill in the saturated pixels with some values. Another common usage
case are masked sky-lines in 1D spectra that similarly need to be
assigned a value for higher-level analysis. In other situations, you
might want a value in an arbitrary point: between the elements/pixels
where you have data. The functions described in this section are for
such operations.

The parametric interpolations discussed below are wrappers around the
interpolation functions of the GNU Scientific Library (or GSL, see *note
GNU Scientific Library::). To identify the different GSL interpolation
types, Gnuastro's ‘gnuastro/interpolate.h’ header file contains macros
that are discussed below. The GSL wrappers provided here are not yet
complete because we are too busy. If you need them, please consider
helping us in adding them to Gnuastro's library. Your contributions
would be very welcome and appreciated.

-- Macro: GAL_INTERPOLATE_NEIGHBORS_METRIC_RADIAL
-- Macro: GAL_INTERPOLATE_NEIGHBORS_METRIC_MANHATTAN
-- Macro: GAL_INTERPOLATE_NEIGHBORS_METRIC_INVALID
The metric used to find distance for nearest neighbor
interpolation. A radial metric uses the simple Euclidean function
to find the distance between two pixels. A manhattan metric will
always be an integer and is like steps (but is also much faster to
calculate than radial metric because it does not need a square root
calculation).

-- Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_MIN
-- Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_MAX
-- Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_MEAN
-- Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_MEDIAN
-- Macro: GAL_INTERPOLATE_NEIGHBORS_FUNC_INVALID
The various types of nearest-neighbor interpolation functions for
‘gal_interpolate_neighbors’. The names are descriptive for the
operation they do, so we will not go into much more detail here.
The median operator will be one of the most used, but operators
like the maximum are good to fill the center of saturated stars.

-- Function:
gal_data_t *
gal_interpolate_neighbors (gal_data_t *input, struct
gal_tile_two_layer_params *tl, uint8_t metric, size_t
numneighbors, size_t numthreads, int onlyblank, int
aslinkedlist, int function)

Interpolate the values in the input dataset using a calculated
statistics from the distribution of their ‘numneighbors’ closest
neighbors. The desired statistics is determined from the ‘func’
argument, which takes any of the ‘GAL_INTERPOLATE_NEIGHBORS_FUNC_’
macros (see above). This function is non-parametric and thus
agnostic to the input's number of dimension or shape of the
distribution.

Distance can be defined on different metrics that are identified
through ‘metric’ (taking values determined by the
‘GAL_INTERPOLATE_NEIGHBORS_METRIC_’ macros described above). If
‘onlyblank’ is non-zero, then only blank elements will be
interpolated and pixels that already have a value will be left
untouched. This function is multi-threaded and will run on
‘numthreads’ threads (see ‘gal_threads_number’ in *note
Multithreaded programming::).

‘tl’ is Gnuastro's tessellation structure used to define tiles over
an image and is fully described in *note Tile grid::. When
‘tl!=NULL’, then it is assumed that the ‘input->array’ contains one
value per tile and interpolation will respect certain tessellation
properties, for example, to not interpolate over channel borders.

If several datasets have the same set of blank values, you do not
need to call this function multiple times. When ‘aslinkedlist’ is
non-zero, then ‘input’ will be seen as a *note List of
gal_data_t::. In this case, the same neighbors will be used for
all the datasets in the list. Of course, the values for each
dataset will be different, so a different value will be written in
each dataset, but the neighbor checking that is the most CPU
intensive part will only be done once.

This is a non-parametric and robust function for interpolation.
The interpolated values are also always within the range of the
non-blank values and strong outliers do not get created. However,
this type of interpolation must be used with care when there are
gradients. This is because it is non-parametric and if there are
not enough neighbors, step-like features can be created.

-- Macro: GAL_INTERPOLATE_1D_INVALID
This is just a place-holder to manage errors.
-- Macro: GAL_INTERPOLATE_1D_LINEAR
[From GSL:] Linear interpolation. This interpolation method does
not require any additional memory.
-- Macro: GAL_INTERPOLATE_1D_POLYNOMIAL
[From GSL:] Polynomial interpolation. This method should only be
used for interpolating small numbers of points because polynomial
interpolation introduces large oscillations, even for well-behaved
datasets. The number of terms in the interpolating polynomial is
equal to the number of points.
-- Macro: GAL_INTERPOLATE_1D_CSPLINE
[From GSL:] Cubic spline with natural boundary conditions. The
resulting curve is piece-wise cubic on each interval, with matching
first and second derivatives at the supplied data-points. The
second derivative is chosen to be zero at the first point and last
point.
-- Macro: GAL_INTERPOLATE_1D_CSPLINE_PERIODIC
[From GSL:] Cubic spline with periodic boundary conditions. The
resulting curve is piece-wise cubic on each interval, with matching
first and second derivatives at the supplied data-points. The
derivatives at the first and last points are also matched. Note
that the last point in the data must have the same y-value as the
first point, otherwise the resulting periodic interpolation will
have a discontinuity at the boundary.
-- Macro: GAL_INTERPOLATE_1D_AKIMA
[From GSL:] Non-rounded Akima spline with natural boundary
conditions. This method uses the non-rounded corner algorithm of
Wodicka.
-- Macro: GAL_INTERPOLATE_1D_AKIMA_PERIODIC
[From GSL:] Non-rounded Akima spline with periodic boundary
conditions. This method uses the non-rounded corner algorithm of
Wodicka.
-- Macro: GAL_INTERPOLATE_1D_STEFFEN
[From GSL:] Steffen's method(1) guarantees the monotonicity of the
interpolating function between the given data points. Therefore,
minima and maxima can only occur exactly at the data points, and
there can never be spurious oscillations between data points. The
interpolated function is piece-wise cubic in each interval. The
resulting curve and its first derivative are guaranteed to be
continuous, but the second derivative may be discontinuous.

-- Function:
gsl_spline *
gal_interpolate_1d_make_gsl_spline (gal_data_t *X, gal_data_t
*Y, int type_1d)
Allocate and initialize a GNU Scientific Library (GSL) 1D
‘gsl_spline’ structure using the non-blank elements of ‘Y’.
‘type_1d’ identifies the interpolation scheme and must be one of
the ‘GAL_INTERPOLATE_1D_*’ macros defined above.

If ‘X==NULL’, the X-axis is assumed to be integers starting from
zero (the index of each element in ‘Y’). Otherwise, the values in
‘X’ will be used to initialize the interpolation structure. Note
that when given, ‘X’ must _not_ contain any blank elements and it
must be sorted (in increasing order).

Each interpolation scheme needs a minimum number of elements to
successfully operate. If the number of non-blank values in ‘Y’ is
less than this number, this function will return a ‘NULL’ pointer.

To be as generic and modular as possible, GSL's tools are
low-level. Therefore before doing the interpolation, many steps
are necessary (like preparing your dataset, then allocating and
initializing ‘gsl_spline’). The metadata available in Gnuastro's
*note Generic data container:: make it easy to hide all those
preparations within this function.

Once ‘gsl_spline’ has been initialized by this function, the
interpolation can be evaluated for any X value within the non-blank
range of the input using ‘gsl_spline_eval’ or ‘gsl_spline_eval_e’.

For example, in the small program below (‘sample-interp.c’), we
read the first two columns of the table in ‘table.txt’ and feed
them to this function to later estimate the values in the second
column for three selected points. You can use *note BuildProgram::
to compile and run this function, see *note Library demo programs::
for more.

Contents of the ‘table.txt’ file:
$ cat table.txt
0 0
1 2
3 6
4 8
6 12
8 16
9 18

Contents of the ‘sample-interp.c’ file:
#include <stdio.h>
#include <stdlib.h>
#include <gnuastro/table.h>
#include <gnuastro/interpolate.h>

int
main(void)
{
size_t i;
gal_data_t *X, *Y;
gsl_spline *spline;
gsl_interp_accel *acc;
gal_list_str_t *cols=NULL;

/* Change the values based on your input table. */
double points[]={1.8, 2.5, 7};

/* Read the first two columns from `tab.txt'.
IMPORTANT: the list is first-in-first-out, so the output
column order is the inverse of the input order. */
gal_list_str_add(&cols, "1", 0);
gal_list_str_add(&cols, "2", 0);
Y=gal_table_read("table.txt", NULL, NULL, cols,
GAL_TABLE_SEARCH_NAME, 0, 1, -1, 1, NULL);
X=Y->next;

/* Allocate the GSL interpolation accelerator and make the
`gsl_spline' structure. */
acc=gsl_interp_accel_alloc();
spline=gal_interpolate_1d_make_gsl_spline(X, Y,
GAL_INTERPOLATE_1D_STEFFEN);

/* Calculate the respective value for all the given points,
if `spline' could be allocated. */
if(spline)
for(i=0; i<(sizeof points)/(sizeof *points); ++i)
printf("%f: %f\n", points[i],
gsl_spline_eval(spline, points[i], acc));

/* Clean up and return. */
gal_data_free(X);
gal_data_free(Y);
gsl_spline_free(spline);
gsl_interp_accel_free(acc);
gal_list_str_free(cols, 0);
return EXIT_SUCCESS;
}

Compile and run this program with *note BuildProgram:: to see the
interpolation results for the three points within the program.
$ astbuildprog sample-interp.c --quiet
1.800000: 3.600000
2.500000: 5.000000
7.000000: 14.000000

-- Function:
void
gal_interpolate_1d_blank (gal_data_t *in, int type_1d)
Fill the blank elements of ‘in’ using the rest of the elements and
the given interpolation. The interpolation scheme can be set
through ‘type_1d’, which accepts any of the ‘GAL_INTERPOLATE_1D_*’
macros above. The interpolation is internally done in 64-bit
floating point type (‘double’). However the evaluated/interpolated
values (originally blank) will be written (in ‘in’) with its
original numeric datatype, using C's standard type conversion.

By definition, interpolation is only defined "between" valid
points. Therefore, if any number of elements on the start or end
of the 1D array are blank, those elements will not be interpolated
and will remain blank. To see if any blank (non-interpolated)
elements remain, you can use ‘gal_blank_present’ on ‘in’ after this
function is finished.

---------- Footnotes ----------

(1) <http://adsabs.harvard.edu/abs/1990A%26A...239..443S>

File: gnuastro.info, Node: Warp library, Next: Color functions, Prev: Interpolation, Up: Gnuastro library

12.3.29 Warp library (‘warp.h’)
-------------------------------

Warping an image to a new pixel grid is commonly necessary as part of
astronomical data reduction, for an introduction, see *note Warp::. For
details of how we resample the old pixel grid to the new pixel grid, see
*note Resampling::. Gnuastro's Warp program uses the following
functions for its default mode (when no linear warps are requested).
Through the following functions, you can directly access those features
in your own custom programs. The linear warping operations of the Warp
program aren't yet brought into the library. If you need them please
get in touch with us at ‘bug-gnuastro@gnu.org’. For usage examples of
this library, please see *note Library demo - Warp to another image:: or
*note Library demo - Warp to new grid::.

You are free to provide any valid WCS keywords to the functions
defined in this library using the ‘gal_warp_wcsalign_t’ data type. This
might be used to align the input image to the standard WCS grid,
potentially changing the pixel scale, removing any valid WCS non-linear
distortion available, and projecting to any valid WCS projection type.
Further details of the warp library functions and parameters are shown
below:

-- Macro: GAL_WARP_OUTPUT_NAME_WARPED
-- Macro: GAL_WARP_OUTPUT_NAME_MAXFRAC
Names of the output datasets (in the ‘name’ component of the output
‘gal_data_t’s). By default the output is only a single dataset,
but when the ‘checkmaxfrac’ component of the input is non-zero, it
will contain two datasets.

-- Type (C struct): gal_warp_wcsalign_t
The main data container for inputs, output and internal variables
to simplify the WCS-aligning functions. Due to the large number of
input variables, this structure makes it easy to call the main
functions. Similar to ‘gal_data_t’, the ‘gal_warp_wcsalign_t’ is a
structure ‘typedef’'d as a new type, see *note Library data
container::. Please note that this structure has elements that are
_allocated_ dynamically and must be freed after usage.
‘gal_warp_wcsalign_free’ only frees the internal variables, so you
are responsible for freeing your own inputs (‘cdelt’, ‘input’,
etc.) and the output. The internal variables are cached here to
cut cpu-intensive computations. To prevent from using
uninitialized variables, we recommend using the helper function
‘gal_warp_wcsalign_template’ to get a clean structure before
setting your own variables. The structure and each of its elements
are defined below:

typedef struct
{
/* Arguments given (and later freed) by the caller. If 'twcs' is
given, then the "WCS To build" elements will be ignored. */
gal_data_t *input;
size_t numthreads;
double coveredfrac;
size_t edgesampling;
gal_data_t *widthinpix;
uint8_t checkmaxfrac;
struct wcsprm *twcs; /* WCS Predefined. */
gal_data_t *ctype; /* WCS To build. */
gal_data_t *cdelt; /* WCS To build. */
gal_data_t *center; /* WCS To build. */

/* Output (must be freed by caller) */
gal_data_t *output;

/* Internal variables (allocated and freed internally) */
size_t v0;
size_t nhor;
size_t ncrn;
size_t gcrn;
int isccw;
gal_data_t *vertices;
} gal_warp_wcsalign_t;

‘gal_data_t *input’
The input dataset. This dataset must contain both the image
array of type ‘GAL_TYPE_FLOAT64’, and ‘input->wcs’ should not
be ‘NULL’ for the WCS-aligning operations to work, see *note
Library demo - Warp to new grid::.

‘size_t numthreads’
Number of threads to use during the WCS aligning operations.
If the given value is ‘0’, the library will calculate the
number of available threads at run-time. The ‘warp’ library
functions are _thread-safe_ so you can freely enjoy the merits
of parallel processing.

‘double coveredfrac’
Acceptable fraction of output pixel that is covered by input
pixels. The value should be between 0 and 1 (inclusive). If
the area of an output pixel is covered by less than this
fraction, its value will be ‘NaN’. For more, see the
description of ‘--coveredfrac’ in *note Invoking astwarp::.

‘size_t edgesampling’
Set the number of extra vertices along each edge of the output
pixel's polygon to account for potential curvature due to
projection or distortion. A value of ‘0’ is usually enough
for this (so the pixel is only defined by a four vertice
polygon. Greater values increase memory usage and program
execution time. For more, please see the description of
‘--edgesampling’ in *note Align pixels with WCS considering
distortions::.

‘gal_data_t *widthinpix’
Output image size (width and height) in number of pixels. If
a ‘NULL’ pointer is passed, the WCS-aligning operations will
estimate the output image size internally such that it
contains the full input. This dataset should have a type of
‘GAL_TYPE_SIZE_T’ and contain exactly two _odd_ values. This
ensures that the center of the central pixel lies at the
requested central coordinate (note that an image with an even
number of pixels doesn't have a "central" pixel!

‘struct wcsprm *twcs’
The target grid WCS which must follow the standard WCSLIB
structure. You can read it from a file using ‘gal_wcs_read’
or create an entirely new one with ‘gal_wcs_create’ and later
free it with ‘gal_wcs_free’, see *note World Coordinate
System::. If this element is given, the ‘ctype’, ‘cdelt’ and
‘center’ elements (which are used to construct a WCS
internally) are ignored.

Please note that the ‘wcsprm’ structure doesn't contain the
image size. To set the final image size, you should use
‘widthinpix’.

‘gal_data_t *ctype’
The output's projection type. The dataset has to have the
type ‘GAL_TYPE_STRING’, containing exactly two strings. Both
strings will be directly passed to WCSLIB and should conform
to the FITS standard's ‘CTYPEi’ keywords, see the description
of ‘--ctype’ in *note Align pixels with WCS considering
distortions::. For example, ‘"RA---TAN"’ and ‘"DEC--TAN"’, or
‘"RA---HPX"’ and ‘"DEC--HPX"’.

‘gal_data_t *cdelt’
Output pixel scale (size of pixel in the WCS units: value to
‘CUNITi’ keywords in FITS, usually degrees). The dataset
should have a type of ‘GAL_TYPE_FLOAT64’ and contain exactly
two values. Hint: to convert arcsec to degrees, just divide
by 3600.

‘gal_data_t *center’
WCS coordinate of the center of the central pixel of the
output. The units depend on the WCS, for example, if the
‘CUNITi’ keywords are ‘deg’, it is in degrees. This dataset
should have a type of ‘GAL_TYPE_FLOAT64’ and contain exactly
two values.

‘uint8_t checkmaxfrac’
When this is non-zero, the output will be a two-element *note
List of gal_data_t::. The second element shows the Moiré
pattern (https://en.wikipedia.org/wiki/Moir%C3%A9_pattern) of
the warp. For more, see *note Moire pattern in stacking and
its correction::.

-- Function:
gal_warp_wcsalign_t
gal_warp_wcsalign_template (void)
A high-level helper function that returns a clean
‘gal_warp_wcsalign_t’ struct with all values initialized This
function returns a copy of a statically allocated structure. So
you don't need to free the returned structure.

The Warp library decides on the program flow based on this struct.
Uninitialized pointers can point to random space in RAM which can
create segmentation faults, or even worse, produce unnoticed
side-effects. It is therefore good practice to manually set unused
pointers to ‘NULL’ and give blank values to numbers Since there are
many variables and pointers in ‘gal_warp_wcsalign_t’, it is easy to
forget _initializing_ them. With that said, we recommend using
this function to minimize human error.

-- Function:
void
gal_warp_wcsalign (gal_warp_wcsalign_t *wa)
A high-level function to align the input dataset's pixels to its
WCS coordinates and write the result in ‘wa->output’. This
function assumes that the input variables have already been set in
the ‘wa’ structure. The input variables are clearly shown in the
definition of ‘gal_warp_wcsalign_t’. It will call the lower level
functions below to do the job and will free the internal variables
afterwards.

The following low-level functions are called from the high-level
‘gal_warp_wcsalign’ function. They are provided here in scenarios where
fine grain control over the thread workflow is necessary, see *note
Multithreaded programming::.

-- Function:
void
gal_warp_wcsalign_init (gal_warp_wcsalign_t *wa)
Low-level function to initialize all the elements inside the ‘wa’
structure assuming that the input variables have been set. The
input variables are clearly shown in the definition of
‘gal_warp_wcsalign_t’. This includes sanity checking the input
arguments, as well as allocating the output image's empty pixels
(that can be filled with ‘gal_warp_wcsalign_onpix’, possibly on
threads).

-- Function:
void
gal_warp_wcsalign_onpix (gal_warp_wcsalign_t *nl, size_t ind)
Low-level function that fills pixel ‘ind’ (counting from 0) in the
already initialized output image.

-- Function:
void *
gal_warp_wcsalign_onthread (void *inparam)
Low-level worker function that can be passed to the high-level
‘gal_threads_spin_off’ or the lower-level ‘pthread_create’ with
some modifications, see *note Multithreaded programming::.

-- Function:
void
gal_warp_wcsalign_free (gal_warp_wcsalign_t *wa)
Low-level function to free the internal variables inside ‘wa’ only.
The caller must free the input pointers themselves, this function
will not free them (they may be necessary in other parts of the
caller's higher-level architecture).

-- Function:
void
gal_warp_pixelarea (gal_warp_wcsalign_t *wa)
Calculate each input pixel's area based on its WCS and save it to a
copy of the input image with only one difference: the pixel values
now show pixel area. For examples on its usage, see *note Pixel
information images::.

File: gnuastro.info, Node: Color functions, Next: Git wrappers, Prev: Warp library, Up: Gnuastro library

12.3.30 Color functions (‘color.h’)
-----------------------------------

The available pre-defined colors in Gnuastro are shown and discussed in
*note Vector graphics colors::. This part of Gnuastro is currently in
charge of mapping the color names to the color IDs and to return the
red-green-blue fractions of each color. On a terminal that supports
24-bit (true color), you can see the full list of color names and a demo
of each color with this command:

$ astconvertt --listcolors

For each color we have a separate macro that starts with ‘GAL_COLOR_’,
and ends with the color name in all-caps.

-- Macro: GAL_COLOR_INVALID
-- Macro: GAL_COLOR_MEDIUMVIOLETRED
-- Macro: GAL_COLOR_DEEPPINK
-- Macro: GAL_COLOR_*
The integer identifiers for each of the named colors in Gnuastro.
Except for the first one (‘GAL_COLOR_INVALID’), we currently have
140 colors from the extended web colors
(https://en.wikipedia.org/wiki/Web_colors#Extended_colors). The
full list of colors and a demo can be visually inspected on the
command-line with the ‘astconvertt --listcolors’ command and is
also shown in *note Vector graphics colors::. The macros have the
same names, just in full-caps.

The functions below can be used to interact with the pre-defined colors:

-- Function:
uint8_t
gal_color_name_to_id (char *name)
Given the name of a color, return the identifier. The name
matching is not case-sensitive.

-- Function:
char *
gal_color_id_to_name (uint8_t color)
Given the ID of a color, return its name.

-- Function:
void
gal_color_in_rgb (uint8_t color, float *f)
Given the identifier of a color, write the color's red-green-blue
fractions in the space that ‘f’ points to. It is up to the caller
to have the space for three 32-bit floating point numbers to be
already allocated before calling this function.

File: gnuastro.info, Node: Git wrappers, Next: Python interface, Prev: Color functions, Up: Gnuastro library

12.3.31 Git wrappers (‘git.h’)
------------------------------

Git is one of the most common tools for version control and it can often
be useful during development, for example, see ‘COMMIT’ keyword in *note
Output FITS files::. At installation time, Gnuastro will also check for
the existence of libgit2, and store the value in the
‘GAL_CONFIG_HAVE_LIBGIT2’, see *note Configuration information:: and
*note Optional dependencies::. ‘gnuastro/git.h’ includes
‘gnuastro/config.h’ internally, so you will not have to include both for
this macro.

-- Function:
char *
gal_git_describe ( )
When libgit2 is present and the program is called within a
directory that is version controlled, this function will return a
string containing the commit description (similar to Gnuastro's
unofficial version number, see *note Version numbering::). If
there are uncommitted changes in the running directory, it will add
a '‘-dirty’' prefix to the description. When there is no tagged
point in the previous commit, this function will return a uniquely
abbreviated commit object as fallback. This function is used for
generating the value of the ‘COMMIT’ keyword in *note Output FITS
files::. The output string is similar to the output of the
following command:

$ git describe --dirty --always

Space for the output string is allocated within this function, so
after using the value you have to ‘free’ the output string. If
libgit2 is not installed or the program calling this function is
not within a version controlled directory, then the output will be
the ‘NULL’ pointer.

File: gnuastro.info, Node: Python interface, Next: Unit conversion library, Prev: Git wrappers, Up: Gnuastro library

12.3.32 Python interface (‘python.h’)
-------------------------------------

Python (https://en.wikipedia.org/wiki/Python_(programming_language)) is
a high-level interpreted programming language that is used by some for
data analysis. Python itself is written in C, which is the same
language that Gnuastro is written in. Hence Gnuastro's library can be
directly used in Python wrappers. The functions in this section provide
some low-level features to simplify the creation of Python modules that
may want to use Gnuastro's advanced and powerful features directly. To
see why Gnuastro was written in C, please see *note Why C::.

*Python interface is not built by default:* to have the features
described in this section, Gnuastro's library needs to be built with the
‘--with-python’ configuration option. For more, on this configuration
option, see *note Gnuastro configure options::. To see if the Gnuastro
library that you are linking with has these features, you can check the
value of ‘GAL_CONFIG_HAVE_PYTHON’ macro, see *note Configuration
information::.

The Gnuastro Python Package is built using CPython. This entails
using Python wrappers around currently existing Gnuastro library
functions to build Python Extension Modules
(https://docs.python.org/3/extending/extending.html#). It also makes
use of the NumPy C-API
(https://numpy.org/doc/stable/reference/c-api/index.html) for dealing
with data arrays. Writing an interface between these and Gnuastro can
be simplified using the functions below. Since many of these functions
depend on the Gnuastro Library itself, it is more convenient to package
them with the Library to facilitate the work of Python package. These
functions will be expanding as Gnuastro's own Python module (pyGnuastro)
grows.

The Python interface of Gnuastro's library is built and installed by
default if a Python 3.0.0 or greater with NumPy is found in ‘$PATH’.
Users may disable this interface with the ‘--without-python’ option to
‘./configure’ when they installed Gnuastro, see *note Gnuastro configure
options::. If you have problems in a Python virtual env, see *note
Optional dependencies::.

Because Python is an optional dependency of Gnuastro, the following
functions may not be available on some systems. To check if the
installed Gnuastro library was compiled with the following functions,
you can use the ‘GAL_CONFIG_HAVE_PYTHON’ macro which is defined in
‘gnuastro/config.h’, see *note Configuration information::.

-- Function:
int
gal_python_type_to_numpy (uint8_t type)
Returns the NumPy datatype corresponding to a certain Gnuastro
‘type’, see *note Library data types::.

-- Function:
uint8_t
gal_python_type_from_numpy (int type)
Returns Gnuastro's numerical datatype that corresponds to the input
NumPy ‘type’. For Gnuastro's recognized data types, see *note
Library data types::.

File: gnuastro.info, Node: Unit conversion library, Next: Spectral lines library, Prev: Python interface, Up: Gnuastro library

12.3.33 Unit conversion library (‘units.h’)
-------------------------------------------

Datasets can contain values in various formats or units. The functions
in this section are defined to facilitate the easy conversion between
them and are declared in ‘units.h’. If there are certain conversions
that are useful for your work, please get in touch.

-- Function:
int
gal_units_extract_decimal (char *convert, const char
*delimiter, double *args, size_t n)
Parse the input ‘convert’ string with a certain delimiter (for
example, ‘01:23:45’, where the delimiter is ‘":"’) as multiple
numbers (for example, 1,23,45) and write them as an array in the
space that ‘args’ is pointing to. The expected number of values in
the string is specified by the ‘n’ argument (3 in the example
above).

If the function succeeds, it will return 1, otherwise it will
return 0 and the values may not be fully written into ‘args’. If
the number of values parsed in the string is different from ‘n’,
this function will fail.

-- Function:
double
gal_units_ra_to_degree (char *convert)
Convert the input Right Ascension (RA) string (in the format of
hours, minutes and seconds either as ‘_h_m_s’ or ‘_:_:_’) to
degrees (a single floating point number).

-- Function:
double
gal_units_dec_to_degree (char *convert)
Convert the input Declination (Dec) string (in the format of
degrees, arc-minutes and arc-seconds either as ‘_d_m_s’ or ‘_:_:_’)
to degrees (a single floating point number).

-- Function:
char *
gal_units_degree_to_ra (double decimal, int usecolon)
Convert the input Right Ascension (RA) degree (a single floating
point number) to old/standard notation (in the format of hours,
minutes and seconds of ‘_h_m_s’). If ‘usecolon!=0’, then the
delimiters between the components will be colons: ‘_:_:_’.

-- Function:
char *
gal_units_degree_to_dec (double decimal, int usecolon)
Convert the input Declination (Dec) degree (a single floating point
number) to old/standard notation (in the format of degrees,
arc-minutes and arc-seconds of ‘_d_m_s’). If ‘usecolon!=0’, then
the delimiters between the components will be colons: ‘_:_:_’.

-- Function:
double
gal_units_counts_to_mag (double counts, double zeropoint)
Convert counts to magnitudes through the given zero point. For
more on the equation, see *note Brightness flux magnitude::.

-- Function:
double
gal_units_mag_to_counts (double mag, double zeropoint)
Convert magnitudes to counts through the given zero point. For
more on the equation, see *note Brightness flux magnitude::.

-- Function:
double
gal_units_mag_to_sb (double mag, double area_arcsec2)
Calculate the surface brightness of a given magnitude, over a
certain area in units of arcsec$^2$. For more on the equation, see
*note Brightness flux magnitude::.

-- Function:
double
gal_units_sb_to_mag (double sb, double area_arcsec2)
Calculate the magnitude of a given surface brightness, over a
certain area in units of arcsec$^2$. For more on the equation, see
*note Brightness flux magnitude::.

-- Function:
double
gal_units_counts_to_sb (double counts, double zeropoint_ab,
double area_arcsec2)
Calculate the surface brightness of a given count level, over a
certain area in units of arcsec$^2$, assuming a certain AB zero
point. For more on the equation, see *note Brightness flux
magnitude::.

-- Function:
double
gal_units_sb_to_counts (double sb, double zeropoint_ab, double
area_arcsec2)
Calculate the counts corresponding to a given surface brightness,
over a certain area in units of arcsec$^2$. For more on the
equation, see *note Brightness flux magnitude::.

-- Function:
double
gal_units_counts_to_jy (double counts, double zeropoint_ab)
Convert counts to Janskys through an AB magnitude-based zero point.
For more on the equation, see *note Brightness flux magnitude::.

-- Function:
double
gal_units_au_to_pc (double au)
Convert the input value (assumed to be in Astronomical Units) to
Parsecs. For the conversion equation, see the description of
‘au-to-pc’ operator in *note Arithmetic operators::.

-- Function:
double
gal_units_counts_to_nanomaggy (double counts, double
zeropoint_ab)
Convert counts to Nanomaggy (with fixed zero point of 22.5) through
an AB magnitude-based zero point.

-- Function:
double
gal_units_nanomaggy_to_counts (double counts, double
zeropoint_ab)
Convert Nanomaggy (with fixed zero point of 22.5) to counts through
an AB magnitude-based zero point.

-- Function:
double
gal_units_pc_to_au (double pc)
Convert the input value (assumed to be in Parsecs) to Astronomical
Units (AUs). For the conversion equation, see the description of
‘au-to-pc’ operator in *note Arithmetic operators::.

-- Function:
double
gal_units_ly_to_pc (double ly)
Convert the input value (assumed to be in Light-years) to Parsecs.
For the conversion equation, see the description of ‘ly-to-pc’
operator in *note Arithmetic operators::.

-- Function:
double
gal_units_pc_to_ly (double pc)
Convert the input value (assumed to be in Parsecs) to Light-years.
For the conversion equation, see the description of ‘ly-to-pc’
operator in *note Arithmetic operators::.

-- Function:
double
gal_units_ly_to_au (double ly)
Convert the input value (assumed to be in Light-years) to
Astronomical Units. For the conversion equation, see the
description of ‘ly-to-pc’ operator in *note Arithmetic operators::.

-- Function:
double
gal_units_au_to_ly (double au)
Convert the input value (assumed to be in Astronomical Units) to
Light-years. For the conversion equation, see the description of
‘ly-to-pc’ operator in *note Arithmetic operators::.

File: gnuastro.info, Node: Spectral lines library, Next: Cosmology library, Prev: Unit conversion library, Up: Gnuastro library

12.3.34 Spectral lines library (‘speclines.h’)
----------------------------------------------

Gnuastro's library has the following macros and functions for dealing
with spectral lines. All these functions are declared in
‘gnuastro/spectra.h’.

-- Macro: GAL_SPECLINES_INVALID
-- Macro: GAL_SPECLINES_Ne_VIII_770
-- Macro: GAL_SPECLINES_Ne_VIII_780
-- Macro: GAL_SPECLINES_Ly_epsilon
-- Macro: GAL_SPECLINES_Ly_delta
-- Macro: GAL_SPECLINES_Ly_gamma
-- Macro: GAL_SPECLINES_C_III_977
-- Macro: GAL_SPECLINES_N_III_989
-- Macro: GAL_SPECLINES_N_III_991_51
-- Macro: GAL_SPECLINES_N_III_991_57
-- Macro: GAL_SPECLINES_Ly_beta
-- Macro: GAL_SPECLINES_O_VI_1031
-- Macro: GAL_SPECLINES_O_VI_1037
-- Macro: GAL_SPECLINES_Ar_I_1066
-- Macro: GAL_SPECLINES_Ly_alpha
-- Macro: GAL_SPECLINES_N_V_1238
-- Macro: GAL_SPECLINES_N_V_1242
-- Macro: GAL_SPECLINES_Si_II_1260
-- Macro: GAL_SPECLINES_Si_II_1264
-- Macro: GAL_SPECLINES_O_I_1302
-- Macro: GAL_SPECLINES_C_II_1334
-- Macro: GAL_SPECLINES_C_II_1335
-- Macro: GAL_SPECLINES_Si_IV_1393
-- Macro: GAL_SPECLINES_O_IV_1397
-- Macro: GAL_SPECLINES_O_IV_1399
-- Macro: GAL_SPECLINES_Si_IV_1402
-- Macro: GAL_SPECLINES_N_IV_1486
-- Macro: GAL_SPECLINES_C_IV_1548
-- Macro: GAL_SPECLINES_C_IV_1550
-- Macro: GAL_SPECLINES_He_II_1640
-- Macro: GAL_SPECLINES_O_III_1660
-- Macro: GAL_SPECLINES_O_III_1666
-- Macro: GAL_SPECLINES_N_III_1746
-- Macro: GAL_SPECLINES_N_III_1748
-- Macro: GAL_SPECLINES_Al_III_1854
-- Macro: GAL_SPECLINES_Al_III_1862
-- Macro: GAL_SPECLINES_Si_III
-- Macro: GAL_SPECLINES_C_III_1908
-- Macro: GAL_SPECLINES_N_II_2142
-- Macro: GAL_SPECLINES_O_III_2320
-- Macro: GAL_SPECLINES_C_II_2323
-- Macro: GAL_SPECLINES_C_II_2324
-- Macro: GAL_SPECLINES_Fe_XI_2648
-- Macro: GAL_SPECLINES_He_II_2733
-- Macro: GAL_SPECLINES_Mg_V_2782
-- Macro: GAL_SPECLINES_Mg_II_2795
-- Macro: GAL_SPECLINES_Mg_II_2802
-- Macro: GAL_SPECLINES_Fe_IV_2829
-- Macro: GAL_SPECLINES_Fe_IV_2835
-- Macro: GAL_SPECLINES_Ar_IV_2853
-- Macro: GAL_SPECLINES_Ar_IV_2868
-- Macro: GAL_SPECLINES_Mg_V_2928
-- Macro: GAL_SPECLINES_He_I_2945
-- Macro: GAL_SPECLINES_O_III_3132
-- Macro: GAL_SPECLINES_He_I_3187
-- Macro: GAL_SPECLINES_He_II_3203
-- Macro: GAL_SPECLINES_O_III_3312
-- Macro: GAL_SPECLINES_Ne_V_3345
-- Macro: GAL_SPECLINES_Ne_V_3425
-- Macro: GAL_SPECLINES_O_III_3444
-- Macro: GAL_SPECLINES_N_I_3466_4
-- Macro: GAL_SPECLINES_N_I_3466_5
-- Macro: GAL_SPECLINES_He_I_3487
-- Macro: GAL_SPECLINES_Fe_VII_3586
-- Macro: GAL_SPECLINES_Fe_VI_3662
-- Macro: GAL_SPECLINES_H_19
-- Macro: GAL_SPECLINES_H_18
-- Macro: GAL_SPECLINES_H_17
-- Macro: GAL_SPECLINES_H_16
-- Macro: GAL_SPECLINES_H_15
-- Macro: GAL_SPECLINES_H_14
-- Macro: GAL_SPECLINES_O_II_3726
-- Macro: GAL_SPECLINES_O_II_3728
-- Macro: GAL_SPECLINES_H_13
-- Macro: GAL_SPECLINES_H_12
-- Macro: GAL_SPECLINES_Fe_VII_3758
-- Macro: GAL_SPECLINES_H_11
-- Macro: GAL_SPECLINES_H_10
-- Macro: GAL_SPECLINES_H_9
-- Macro: GAL_SPECLINES_Fe_V_3839
-- Macro: GAL_SPECLINES_Ne_III_3868
-- Macro: GAL_SPECLINES_He_I_3888
-- Macro: GAL_SPECLINES_H_8
-- Macro: GAL_SPECLINES_Fe_V_3891
-- Macro: GAL_SPECLINES_Fe_V_3911
-- Macro: GAL_SPECLINES_Ne_III_3967
-- Macro: GAL_SPECLINES_H_epsilon
-- Macro: GAL_SPECLINES_He_I_4026
-- Macro: GAL_SPECLINES_S_II_4068
-- Macro: GAL_SPECLINES_Fe_V_4071
-- Macro: GAL_SPECLINES_S_II_4076
-- Macro: GAL_SPECLINES_H_delta
-- Macro: GAL_SPECLINES_He_I_4143
-- Macro: GAL_SPECLINES_Fe_II_4178
-- Macro: GAL_SPECLINES_Fe_V_4180
-- Macro: GAL_SPECLINES_Fe_II_4233
-- Macro: GAL_SPECLINES_Fe_V_4227
-- Macro: GAL_SPECLINES_Fe_II_4287
-- Macro: GAL_SPECLINES_Fe_II_4304
-- Macro: GAL_SPECLINES_O_II_4317
-- Macro: GAL_SPECLINES_H_gamma
-- Macro: GAL_SPECLINES_O_III_4363
-- Macro: GAL_SPECLINES_Ar_XIV
-- Macro: GAL_SPECLINES_O_II_4414
-- Macro: GAL_SPECLINES_Fe_II_4416
-- Macro: GAL_SPECLINES_Fe_II_4452
-- Macro: GAL_SPECLINES_He_I_4471
-- Macro: GAL_SPECLINES_Fe_II_4489
-- Macro: GAL_SPECLINES_Fe_II_4491
-- Macro: GAL_SPECLINES_N_III_4510
-- Macro: GAL_SPECLINES_Fe_II_4522
-- Macro: GAL_SPECLINES_Fe_II_4555
-- Macro: GAL_SPECLINES_Fe_II_4582
-- Macro: GAL_SPECLINES_Fe_II_4583
-- Macro: GAL_SPECLINES_Fe_II_4629
-- Macro: GAL_SPECLINES_N_III_4634
-- Macro: GAL_SPECLINES_N_III_4640
-- Macro: GAL_SPECLINES_N_III_4641
-- Macro: GAL_SPECLINES_C_III_4647
-- Macro: GAL_SPECLINES_C_III_4650
-- Macro: GAL_SPECLINES_C_III_5651
-- Macro: GAL_SPECLINES_Fe_III_4658
-- Macro: GAL_SPECLINES_He_II_4685
-- Macro: GAL_SPECLINES_Ar_IV_4711
-- Macro: GAL_SPECLINES_Ar_IV_4740
-- Macro: GAL_SPECLINES_H_beta
-- Macro: GAL_SPECLINES_Fe_VII_4893
-- Macro: GAL_SPECLINES_Fe_IV_4903
-- Macro: GAL_SPECLINES_Fe_II_4923
-- Macro: GAL_SPECLINES_O_III_4958
-- Macro: GAL_SPECLINES_O_III_5006
-- Macro: GAL_SPECLINES_Fe_II_5018
-- Macro: GAL_SPECLINES_Fe_III_5084
-- Macro: GAL_SPECLINES_Fe_VI_5145
-- Macro: GAL_SPECLINES_Fe_VII_5158
-- Macro: GAL_SPECLINES_Fe_II_5169
-- Macro: GAL_SPECLINES_Fe_VI_5176
-- Macro: GAL_SPECLINES_Fe_II_5197
-- Macro: GAL_SPECLINES_N_I_5200
-- Macro: GAL_SPECLINES_Fe_II_5234
-- Macro: GAL_SPECLINES_Fe_IV_5236
-- Macro: GAL_SPECLINES_Fe_III_5270
-- Macro: GAL_SPECLINES_Fe_II_5276
-- Macro: GAL_SPECLINES_Fe_VII_5276
-- Macro: GAL_SPECLINES_Fe_XIV
-- Macro: GAL_SPECLINES_Ca_V
-- Macro: GAL_SPECLINES_Fe_II_5316_6
-- Macro: GAL_SPECLINES_Fe_II_5316_7
-- Macro: GAL_SPECLINES_Fe_VI_5335
-- Macro: GAL_SPECLINES_Fe_VI_5424
-- Macro: GAL_SPECLINES_Cl_III_5517
-- Macro: GAL_SPECLINES_Cl_III_5537
-- Macro: GAL_SPECLINES_Fe_VI_5637
-- Macro: GAL_SPECLINES_Fe_VI_5677
-- Macro: GAL_SPECLINES_C_III_5697
-- Macro: GAL_SPECLINES_Fe_VII_5720
-- Macro: GAL_SPECLINES_N_II_5754
-- Macro: GAL_SPECLINES_C_IV_5801
-- Macro: GAL_SPECLINES_C_IV_5811
-- Macro: GAL_SPECLINES_He_I_5875
-- Macro: GAL_SPECLINES_O_I_6046
-- Macro: GAL_SPECLINES_Fe_VII_6087
-- Macro: GAL_SPECLINES_O_I_6300
-- Macro: GAL_SPECLINES_S_III_6312
-- Macro: GAL_SPECLINES_Si_II_6347
-- Macro: GAL_SPECLINES_O_I_6363
-- Macro: GAL_SPECLINES_Fe_II_6369
-- Macro: GAL_SPECLINES_Fe_X
-- Macro: GAL_SPECLINES_Fe_II_6516
-- Macro: GAL_SPECLINES_N_II_6548
-- Macro: GAL_SPECLINES_H_alpha
-- Macro: GAL_SPECLINES_N_II_6583
-- Macro: GAL_SPECLINES_S_II_6716
-- Macro: GAL_SPECLINES_S_II_6730
-- Macro: GAL_SPECLINES_O_I_7002
-- Macro: GAL_SPECLINES_Ar_V
-- Macro: GAL_SPECLINES_He_I_7065
-- Macro: GAL_SPECLINES_Ar_III_7135
-- Macro: GAL_SPECLINES_Fe_II_7155
-- Macro: GAL_SPECLINES_Ar_IV_7170
-- Macro: GAL_SPECLINES_Fe_II_7172
-- Macro: GAL_SPECLINES_C_II_7236
-- Macro: GAL_SPECLINES_Ar_IV_7237
-- Macro: GAL_SPECLINES_O_I_7254
-- Macro: GAL_SPECLINES_Ar_IV_7262
-- Macro: GAL_SPECLINES_He_I_7281
-- Macro: GAL_SPECLINES_O_II_7319
-- Macro: GAL_SPECLINES_O_II_7330
-- Macro: GAL_SPECLINES_Ni_II_7377
-- Macro: GAL_SPECLINES_Ni_II_7411
-- Macro: GAL_SPECLINES_Fe_II_7452
-- Macro: GAL_SPECLINES_N_I_7468
-- Macro: GAL_SPECLINES_S_XII
-- Macro: GAL_SPECLINES_Ar_III_7751
-- Macro: GAL_SPECLINES_He_I_7816
-- Macro: GAL_SPECLINES_Ar_I_7868
-- Macro: GAL_SPECLINES_Ni_III
-- Macro: GAL_SPECLINES_Fe_XI_7891
-- Macro: GAL_SPECLINES_He_II_8236
-- Macro: GAL_SPECLINES_Pa_20
-- Macro: GAL_SPECLINES_Pa_19
-- Macro: GAL_SPECLINES_Pa_18
-- Macro: GAL_SPECLINES_O_I_8446
-- Macro: GAL_SPECLINES_Pa_17
-- Macro: GAL_SPECLINES_Ca_II_8498
-- Macro: GAL_SPECLINES_Pa_16
-- Macro: GAL_SPECLINES_Ca_II_8542
-- Macro: GAL_SPECLINES_Pa_15
-- Macro: GAL_SPECLINES_Cl_II
-- Macro: GAL_SPECLINES_Pa_14
-- Macro: GAL_SPECLINES_Fe_II_8616
-- Macro: GAL_SPECLINES_Ca_II_8662
-- Macro: GAL_SPECLINES_Pa_13
-- Macro: GAL_SPECLINES_N_I_8680
-- Macro: GAL_SPECLINES_N_I_8703
-- Macro: GAL_SPECLINES_N_I_8711
-- Macro: GAL_SPECLINES_Pa_12
-- Macro: GAL_SPECLINES_Pa_11
-- Macro: GAL_SPECLINES_Fe_II_8891
-- Macro: GAL_SPECLINES_Pa_10
-- Macro: GAL_SPECLINES_S_III_9068
-- Macro: GAL_SPECLINES_Pa_9
-- Macro: GAL_SPECLINES_S_III_9531
-- Macro: GAL_SPECLINES_Pa_epsilon
-- Macro: GAL_SPECLINES_C_I_9824
-- Macro: GAL_SPECLINES_C_I_9850
-- Macro: GAL_SPECLINES_S_VIII
-- Macro: GAL_SPECLINES_He_I_10027
-- Macro: GAL_SPECLINES_He_I_10031
-- Macro: GAL_SPECLINES_Pa_delta
-- Macro: GAL_SPECLINES_S_II_10286
-- Macro: GAL_SPECLINES_S_II_10320
-- Macro: GAL_SPECLINES_S_II_10336
-- Macro: GAL_SPECLINES_Fe_XIII
-- Macro: GAL_SPECLINES_He_I_10830
-- Macro: GAL_SPECLINES_Pa_gamma
-- Macro: GAL_SPECLINES_NUMBER
Internal values/identifiers for recognized spectral lines as is
clear from their names. They are based on the UV an optical table
of galaxy emission lines of Drew Chojnowski(1).

Note the first and last macros, they can be used when parsing the
lines automatically: both do not correspond to any line, but their
integer values correspond to the two integers just before and after
the first and last line identifier: ‘GAL_SPECLINES_INVALID’ has a
value of zero, and allows you to have a fixed integer which never
corresponds to a line. ‘GAL_SPECLINES_INVALID_MAX’ is the total
number of pre-defined lines, plus one. So you can parse all the
known lines with a ‘for’ loop like this:
for(i=1;i<GAL_SPECLINES_INVALID_MAX;++i)

-- Macro: GAL_SPECLINES_ANGSTROM_*
Wavelength (in Angstroms) of the named lines. The ‘*’ can take any
of the line names of the ‘GAL_SPECLINES_*’ Macros above.

-- Macro: GAL_SPECLINES_NAME_*
Names (as literal stings without any space) that can be used to
refer to the lines in your program and converted to and from line
identifiers using the functions below. The ‘*’ can take any of the
line names of the ‘GAL_SPECLINES_*’ Macros above.

-- Function:
char *
gal_speclines_line_name (int linecode)
Return the literal string of the given spectral line identifier
Macro (for example ‘GAL_SPECLINES_HALPHA’ or
‘GAL_SPECLINES_LYLIMIT’).

-- Function:
int
gal_speclines_line_code (char *name)
Return the spectral line identifier of the given standard name (for
example ‘GAL_SPECLINES_NAME_HALPHA’ or
‘GAL_SPECLINES_NAME_LYLIMIT’).

-- Function:
double
gal_speclines_line_angstrom (int linecode)
Return the wavelength (in Angstroms) of the given line.

-- Function:
double
gal_speclines_line_redshift (double obsline, double restline)
Return the redshift where the observed wavelength (‘obsline’) was
emitted from (if its restframe wavelength was ‘restline’).

-- Function:
double
gal_speclines_line_redshift_code (double obsline, int
linecode)
Return the redshift where the observed wavelength (‘obsline’) was
emitted from a pre-defined spectral line in the macros above. For
example, you want the redshift where the H-alpha line falls at a
wavelength of 8000 Angstroms, you can call this function like this:

gal_speclines_line_redshift_code(8000, GAL_SPECLINES_H_alpha);

---------- Footnotes ----------

(1) <http://astronomy.nmsu.edu/drewski/tableofemissionlines.html>

File: gnuastro.info, Node: Cosmology library, Next: SAO DS9 library, Prev: Spectral lines library, Up: Gnuastro library

12.3.35 Cosmology library (‘cosmology.h’)
-----------------------------------------

This library does the main cosmological calculations that are commonly
necessary in extra-galactic astronomical studies. The main variable in
this context is the redshift ($z$). The cosmological input parameters
in the functions below are ‘H0’, ‘o_lambda_0’, ‘o_matter_0’,
‘o_radiation_0’ which respectively represent the current (at redshift 0)
expansion rate (Hubble constant in units of km/sec/Mpc), cosmological
constant ($\Lambda$), matter and radiation densities.

All these functions are declared in ‘gnuastro/cosmology.h’. For a
more extended introduction/discussion of the cosmological parameters,
please see *note CosmicCalculator::.

-- Function:
double
gal_cosmology_age (double z, double H0, double o_lambda_0,
double o_matter_0, double o_radiation_0)
Returns the age of the universe at redshift ‘z’ in units of Giga
years.

-- Function:
double
gal_cosmology_proper_distance (double z, double H0, double
o_lambda_0, double o_matter_0, double o_radiation_0)
Returns the proper distance to an object at redshift ‘z’ in units
of Mega parsecs.

-- Function:
double
gal_cosmology_comoving_volume (double z, double H0, double
o_lambda_0, double o_matter_0, double o_radiation_0)
Returns the comoving volume over 4pi stradian to ‘z’ in units of
Mega parsecs cube.

-- Function:
double
gal_cosmology_critical_density (double z, double H0, double
o_lambda_0, double o_matter_0, double o_radiation_0)
Returns the critical density at redshift ‘z’ in units of $g/cm^3$.

-- Function:
double
gal_cosmology_angular_distance (double z, double H0, double
o_lambda_0, double o_matter_0, double o_radiation_0)
Return the angular diameter distance to an object at redshift ‘z’
in units of Mega parsecs.

-- Function:
double
gal_cosmology_luminosity_distance (double z, double H0, double
o_lambda_0, double o_matter_0, double o_radiation_0)
Return the luminosity diameter distance to an object at redshift
‘z’ in units of Mega parsecs.

-- Function:
double
gal_cosmology_distance_modulus (double z, double H0, double
o_lambda_0, double o_matter_0, double o_radiation_0)
Return the distance modulus at redshift ‘z’ (with no units).

-- Function:
double
gal_cosmology_to_absolute_mag (double z, double H0, double
o_lambda_0, double o_matter_0, double o_radiation_0)
Return the conversion from apparent to absolute magnitude for an
object at redshift ‘z’. This value has to be added to the apparent
magnitude to give the absolute magnitude of an object at redshift
‘z’.

-- Function:
double
gal_cosmology_velocity_from_z (double z)
Return the velocity (in km/s) corresponding to the given redshift
(‘z’).

-- Function:
double
gal_cosmology_z_from_velocity (double v)
Return the redshift corresponding to the given velocity (‘v’ in
km/s).

File: gnuastro.info, Node: SAO DS9 library, Prev: Cosmology library, Up: Gnuastro library

12.3.36 SAO DS9 library (‘ds9.h’)
---------------------------------

This library operates on the output files of SAO DS9(1). SAO DS9 is one
of the most commonly used FITS image and cube viewers today with an easy
to use graphic user interface (GUI), see *note SAO DS9::. But besides
merely opening FITS data, it can also produce certain kinds of files
that can be useful in common analysis. For example, on DS9's GUI, it is
very easy to define a (possibly complex) polygon as a "region". You can
then save that "region" into a file and using the functions below, feed
the polygon into Gnuastro's programs (or your custom programs).

-- Macro: GAL_DS9_COORD_MODE_IMG
-- Macro: GAL_DS9_COORD_MODE_WCS
-- Macro: GAL_DS9_COORD_MODE_INVALID
Macros to identify the coordinate mode of the DS9 file. Their
names are sufficiently descriptive. The last one (‘INVALID’) is
for sanity checks (for example, to know if the mode is already
selected).

-- Function:
gal_data_t *
gal_ds9_reg_read_polygon (char *filename)
Returns an allocated generic data container (‘gal_data_t’, with an
array of ‘GAL_TYPE_FLOAT64’) containing the vertices of a polygon
within the SAO DS9 region file given by ‘*filename’. Since SAO DS9
region files are 2 dimensional, if there are $N$ vertices in the
SAO DS9 region file, the returned dataset will have $2\times N$
elements (first two elements belonging to first vertice, etc.).

The mode to interpret the vertice coordinates is also read from the
SAO DS9 region file and written into the ‘status’ attribute of the
output ‘gal_data_t’. The coordinate mode can be one of the
‘GAL_DS9_COORD_MODE_*’ macros, mentioned above.

It is assumed that the file begins with ‘# Region file format: DS9’
and it has two more lines (at least): a line containing the mode of
the coordinates (the line should only contain either ‘fk5’ or
‘image’), a line with the polygon vertices following this format:
‘polygon(V1X,V1Y,V2X,V2Y,...)’ where ‘V1X’ and ‘V1Y’ are the
horizontal and vertical coordinates of the first vertice, and so
on.

For example, here is a minimal acceptable SAO DS9 region file:

# Region file format: DS9
fk5
polygon(53.187414,-27.779152,53.159507,-27.759633,...)

---------- Footnotes ----------

(1) <https://sites.google.com/cfa.harvard.edu/saoimageds9>

File: gnuastro.info, Node: Library demo programs, Prev: Gnuastro library, Up: Library

12.4 Library demo programs
==========================

In this final section of *note Library::, we give some example Gnuastro
programs to demonstrate various features in the library. All these
programs have been tested and once Gnuastro is installed you can compile
and run them with Gnuastro's *note BuildProgram:: program that will take
care of linking issues. If you do not have any FITS file to experiment
on, you can use those that are generated by Gnuastro after ‘make check’
in the ‘tests/’ directory, see *note Quick start::.

* Menu:

* Library demo - reading a image:: Read a FITS image into memory.
* Library demo - inspecting neighbors:: Inspect the neighbors of a pixel.
* Library demo - multi-threaded operation:: Doing an operation on threads.
* Library demo - reading and writing table columns:: Simple Column I/O.
* Library demo - Warp to another image:: Output pixel grid and WCS from another image.
* Library demo - Warp to new grid:: Define a new pixel grid and WCS to resample the input.

File: gnuastro.info, Node: Library demo - reading a image, Next: Library demo - inspecting neighbors, Prev: Library demo programs, Up: Library demo programs

12.4.1 Library demo - reading a FITS image
------------------------------------------

The following simple program demonstrates how to read a FITS image into
memory and use the ‘void *array’ pointer in of *note Generic data
container::. For easy linking/compilation of this program along with a
first run see *note BuildProgram:: (in short: Compile, link and run
‘myprogram.c' with this command: '‘astbuildprog myprogram.c’). Before
running, also change the ‘filename’ and ‘hdu’ variable values to specify
an existing FITS file and/or extension/HDU.

This is just intended to demonstrate how to use the ‘array’ pointer
of ‘gal_data_t’. Hence it does not do important sanity checks, for
example in real datasets you may also have blank pixels. In such cases,
this program will return a NaN value (see *note Blank pixels::). So for
general statistical information of a dataset, it is much better to use
Gnuastro's *note Statistics:: program which can deal with blank pixels
and many other issues in a generic dataset.

To encourage good coding practices, this script contains a copyright
notice with a place holder for your name and your email (as you
customize it for your own purpose). Always keep a one-line description
and copyright notice like this in all your scripts, such "metadata" is
very important to accompany every source file you write. Of course,
when you write the source file from scratch and just learn how to use a
single function from this manual, only your name/year should appear.
The existing name of the original author of this example program is only
for cases where you copy-paste this whole file.

/* Reading a FITS image into memory.
*
* The following simple program demonstrates how to read a FITS image
* into memory and use the 'void *array' pointer. This is just intended
* to demonstrate how to use the array pointer of 'gal_data_t'.
*
* Copyright (C) 2024 Your Name <your@email.address>
* Copyright (C) 2020-2024 Mohammad Akhlaghi <mohammad@akhlaghi.org>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

#include <stdio.h>
#include <stdlib.h>

#include <gnuastro/fits.h> /* includes gnuastro's data.h and type.h */
#include <gnuastro/statistics.h>

int
main(void)
{
size_t i;
float *farray;
double sum=0.0f;
gal_data_t *image;
char *filename="img.fits", *hdu="1";

/* Read `img.fits' (HDU: 1) as a float32 array. */
image=gal_fits_img_read_to_type(filename, hdu, GAL_TYPE_FLOAT32,
-1, 1, NULL);

/* Use the allocated space as a single precision floating
* point array (recall that `image->array' has `void *'
* type, so it is not directly usable). */
farray=image->array;

/* Calculate the sum of all the values. */
for(i=0; i<image->size; ++i)
sum += farray[i];

/* Report the sum. */
printf("Sum of values in %s (hdu %s) is: %f\n",
filename, hdu, sum);

/* Clean up and return. */
gal_data_free(image);
return EXIT_SUCCESS;
}

File: gnuastro.info, Node: Library demo - inspecting neighbors, Next: Library demo - multi-threaded operation, Prev: Library demo - reading a image, Up: Library demo programs

12.4.2 Library demo - inspecting neighbors
------------------------------------------

The following simple program shows how you can inspect the neighbors of
a pixel using the ‘GAL_DIMENSION_NEIGHBOR_OP’ function-like macro that
was introduced in *note Dimensions::. For easy linking/compilation of
this program along with a first run see *note BuildProgram::. Before
running, also change the file name and HDU (first and second arguments
to ‘gal_fits_img_read_to_type’) to specify an existing FITS file and/or
extension/HDU.

/* Reading a FITS image into memory.
*
* The following simple program shows how you can inspect the neighbors
* of a pixel using the GAL_DIMENSION_NEIGHBOR_OP function-like macro.
*
* Copyright (C) 2024 Your Name <your@email.address>
* Copyright (C) 2020-2024 Mohammad Akhlaghi <mohammad@akhlaghi.org>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

#include <stdio.h>
#include <stdlib.h>

#include <gnuastro/fits.h>
#include <gnuastro/dimension.h>

int
main(void)
{
double sum;
float *array;
size_t i, num, *dinc;
gal_data_t *input=gal_fits_img_read_to_type("input.fits", "1",
GAL_TYPE_FLOAT32, -1, 1,
NULL);

/* To avoid the `void *' pointer and have `dinc'. */
array=input->array;
dinc=gal_dimension_increment(input->ndim, input->dsize);

/* Go over all the pixels. */
for(i=0;i<input->size;++i)
{
num=0;
sum=0.0f;
GAL_DIMENSION_NEIGHBOR_OP( i, input->ndim, input->dsize,
input->ndim, dinc,
{++num; sum+=array[nind];} );
printf("%zu: num: %zu, sum: %f\n", i, num, sum);
}

/* Clean up and return. */
gal_data_free(input);
free(dinc);
return EXIT_SUCCESS;
}

File: gnuastro.info, Node: Library demo - multi-threaded operation, Next: Library demo - reading and writing table columns, Prev: Library demo - inspecting neighbors, Up: Library demo programs

12.4.3 Library demo - multi-threaded operation
----------------------------------------------

The following simple program shows how to use Gnuastro to simplify
spinning off threads and distributing different jobs between the
threads. The relevant thread-related functions are defined in *note
Gnuastro's thread related functions::. For easy linking/compilation of
this program, along with a first run, see Gnuastro's *note
BuildProgram::. Before running, also change the ‘filename’ and ‘hdu’
variable values to specify an existing FITS file and/or extension/HDU.

This is a very simple program to open a FITS image, distribute its
pixels between different threads and print the value of each pixel and
the thread it was assigned to. The actual operation is very simple (and
would not usually be done with threads in a real-life program). It is
intentionally chosen to put more focus on the important steps in
spinning off threads and how the worker function (which is called by
each thread) can identify the job-IDs it should work on.

For example, instead of an array of pixels, you can define an array
of tiles or any other context-specific structures as separate targets.
The important thing is that each action should have its own unique ID
(counting from zero, as is done in an array in C). You can then follow
the process below and use each thread to work on all the targets that
are assigned to it. Recall that spinning off threads is itself an
expensive process and we do not want to spin-off one thread for each
target (see the description of ‘gal_threads_dist_in_threads’ in *note
Gnuastro's thread related functions::.

There are many (more complicated, real-world) examples of using
‘gal_threads_spin_off’ in Gnuastro's actual source code, you can see
them by searching for the ‘gal_threads_spin_off’ function from the top
source (after unpacking the tarball) directory (for example, with this
command):

$ grep -r gal_threads_spin_off ./

The code of this demonstration program is shown below. This program
was also built and run when you ran ‘make check’ during the building of
Gnuastro (‘tests/lib/multithread.c’), so it is already tested for your
system and you can safely use it as a guide.

/* Demo of Gnuastro's high-level multi-threaded interface.
*
* This is a very simple program to open a FITS image, distribute its
* pixels between different threads and print the value of each pixel
* and the thread it was assigned to.
*
* Copyright (C) 2024 Your Name <your@email.address>
* Copyright (C) 2020-2024 Mohammad Akhlaghi <mohammad@akhlaghi.org>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

#include <stdio.h>
#include <stdlib.h>

#include <gnuastro/fits.h>
#include <gnuastro/threads.h>

/* This structure can keep all information you want to pass onto the
* worker function on each thread. */
struct params
{
gal_data_t *image; /* Dataset to print values of. */
};

/* This is the main worker function which will be called by the
* different threads. `gal_threads_params' is defined in
* `gnuastro/threads.h' and contains the pointer to the parameter we
* want. Note that the input argument and returned value of this
* function always must have `void *' type. */
void *
worker_on_thread(void *in_prm)
{
/* Low-level definitions to be done first. */
struct gal_threads_params *tprm=(struct gal_threads_params *)in_prm;
struct params *p=(struct params *)tprm->params;

/* Subsequent definitions. */
float *array=p->image->array;
size_t i, index, *dsize=p->image->dsize;

/* Go over all the actions (pixels in this case) that were assigned
* to this thread. */
for(i=0; tprm->indexs[i] != GAL_BLANK_SIZE_T; ++i)
{
/* For easy reading. */
index = tprm->indexs[i];

/* Print the information. */
printf("(%zu, %zu) on thread %zu: %g\n", index%dsize[1]+1,
index/dsize[1]+1, tprm->id, array[index]);
}

/* Wait for all the other threads to finish, then return. */
if(tprm->b) pthread_barrier_wait(tprm->b);
return NULL;
}

/* High-level function (called by the operating system). */
int
main(void)
{
struct params p;
char *filename="input.fits", *hdu="1";
size_t numthreads=gal_threads_number();

/* We are using * `-1' for `minmapsize' to ensure that the image is
* read into * memory and `1' for `quietmmap' (which can also be
* zero), see the "Memory management" section in the book. */
int quietmmap=1;
size_t minmapsize=-1;

/* Read the image into memory as a float32 data type. */
p.image=gal_fits_img_read_to_type(filename, hdu, GAL_TYPE_FLOAT32,
minmapsize, quietmmap, NULL);

/* Print some basic information before the actual contents: */
printf("Pixel values of %s (HDU: %s) on %zu threads.\n", filename,
hdu, numthreads);
printf("Used to check the compiled library's capability in opening "
"a FITS file, and also spinning off threads.\n");

/* A small sanity check: this is only intended for 2D arrays (to
* print the coordinates of each pixel). */
if(p.image->ndim!=2)
{
fprintf(stderr, "only 2D images are supported.");
exit(EXIT_FAILURE);
}

/* Spin-off the threads and do the processing on each thread. */
gal_threads_spin_off(worker_on_thread, &p, p.image->size, numthreads,
minmapsize, quietmmap);

/* Clean up and return. */
gal_data_free(p.image);
return EXIT_SUCCESS;
}

File: gnuastro.info, Node: Library demo - reading and writing table columns, Next: Library demo - Warp to another image, Prev: Library demo - multi-threaded operation, Up: Library demo programs

12.4.4 Library demo - reading and writing table columns
-------------------------------------------------------

Tables are some of the most common inputs to, and outputs of programs.
This section contains a small program for reading and writing tables
using the constructs described in *note Table input output::. For easy
linking/compilation of this program, along with a first run, see
Gnuastro's *note BuildProgram::. Before running, also set the following
file and column names in the first two lines of ‘main’. The input and
output names may be ‘.txt’ and ‘.fits’ tables, ‘gal_table_read’ and
‘gal_table_write’ will be able to write to both formats. For plain text
tables see *note Gnuastro text table format::. If you do not have any
table in text file format to use as your input, you can use the table
that is generated in *note Sufi simulates a detection:: section.

This example program reads three columns from a table. The first two
columns are selected by their name (‘NAME1’ and ‘NAME2’) and the third
is selected by its number: column 10 (counting from 1). Gnuastro's
column selection is discussed in *note Selecting table columns::. The
first and second columns can be any type, but this program will convert
them to ‘int32_t’ and ‘float’ for its internal usage respectively.
However, the third column must be double for this program. So if it is
not, the program will abort with an error. Having the columns in
memory, it will print them out along with their sum (just a simple
application, you can do what ever you want at this stage). Reading the
table finishes here.

The rest of the program is a demonstration of writing a table. While
parsing the rows, this program will change the first column (to be
counters) and multiply the second by 10 (so the output will be
different). Then it will define the order of the output columns by
setting the ‘next’ element (to create a *note List of gal_data_t::).
Before writing, this function will also set names for the columns (units
and comments can be defined in a similar manner). Writing the columns
to a file is then done through a simple call to ‘gal_table_write’.

The operations that are shown in this example program are not
necessary all the time. For example, in many cases, you know the
numerical data type of the column before writing your program (see *note
Numeric data types::), so type checking and copying to a specific type
will not be necessary.

/* Reading and writing table columns.
*
* This example program reads three columns from a table. Having the
* columns in memory, it will print them out along with their sum. The
* rest of the program is a demonstration of writing a table.
*
* Copyright (C) 2024 Your Name <your@@email.address>
* Copyright (C) 2020-2024 Mohammad Akhlaghi <mohammad@@akhlaghi.org>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

#include <stdio.h>
#include <stdlib.h>

#include <gnuastro/table.h>

int
main(void)
{
/* File names and column names (which may also be numbers). */
char *c1_name="NAME1", *c2_name="NAME2", *c3_name="10";
char *inname="input.fits", *hdu="1", *outname="out.fits";

/* Internal parameters. */
float *array2=NULL;
double *array3=NULL;
int32_t *array1=NULL;
size_t i, counter=0;
gal_data_t *c1=NULL;
gal_data_t *c2=NULL;
gal_data_t tmp, *col, *columns;
gal_list_str_t *column_ids=NULL;

/* Define the columns to read. */
gal_list_str_add(&column_ids, c1_name, 0);
gal_list_str_add(&column_ids, c2_name, 0);
gal_list_str_add(&column_ids, c3_name, 0);

/* The columns were added in reverse, so correct it. */
gal_list_str_reverse(&column_ids);

/* Read the desired columns. */
columns = gal_table_read(inname, hdu, NULL, column_ids,
GAL_TABLE_SEARCH_NAME, 0, 1, -1, 1, NULL);

/* Go over the columns, we will assume that you do not know their type
* a-priori, so we will check */
counter=1;
for(col=columns; col!=NULL; col=col->next)
switch(counter++)
{
case 1: /* First column: we want it as int32_t. */
c1=gal_data_copy_to_new_type(col, GAL_TYPE_INT32);
array1 = c1->array;
break;

case 2: /* Second column: we want it as float. */
c2=gal_data_copy_to_new_type(col, GAL_TYPE_FLOAT32);
array2 = c2->array;
break;

case 3: /* Third column: it MUST be double. */
if(col->type!=GAL_TYPE_FLOAT64)
{
fprintf(stderr, "Column %s must be float64 type, it is "
"%s", c3_name, gal_type_name(col->type, 1));
exit(EXIT_FAILURE);
}
array3 = col->array;
break;

default:
exit(EXIT_FAILURE);
}

/* As an example application we will just print them out. In the
* meantime (just for a simple demonstration), change the first
* array value to the counter and multiply the second by 10. */
for(i=0;i<c1->size;++i)
{
printf("%zu: %d + %f + %f = %f\n", i+1, array1[i], array2[i],
array3[i], array1[i]+array2[i]+array3[i]);
array1[i] = i+1;
array2[i] *= 10;
}

/* Link the first two columns as a list. */
c1->next = c2;
c2->next = NULL;

/* Set names for the columns and write them out. */
c1->name = "COUNTER";
c2->name = "VALUE";
gal_table_write(c1, NULL, NULL, GAL_TABLE_FORMAT_BFITS, outname,
"MY-COLUMNS", 0, 0);

/* The names were not allocated, so to avoid cleaning-up problems,
* we will set them to NULL. */
c1->name = c2->name = NULL;

/* Clean up and return. */
gal_data_free(c1);
gal_data_free(c2);
gal_list_data_free(columns);
gal_list_str_free(column_ids, 0); /* strings were not allocated. */
return EXIT_SUCCESS;
}

File: gnuastro.info, Node: Library demo - Warp to another image, Next: Library demo - Warp to new grid, Prev: Library demo - reading and writing table columns, Up: Library demo programs

12.4.5 Library demo - Warp to another image
-------------------------------------------

Gnuastro's warp library (that you can access by including
‘gnuastro/warp.h’) allows you to resample an image from a grid to
another entirely using the WCSLIB (while accounting for distortions if
necessary; see *note Warp library::). The Warp library uses a
pixel-mixing or area-based resampling approach which is fully described
in *note Resampling::. The most generic uses cases for this library are
already available in the *note Invoking astwarp:: program. For a
related demo (where the output grid and WCS are constructed from
scratch), see *note Library demo - Warp to new grid::.

In the example below, we are warping the ‘input.fits’ file to the
same pixel grid and WCS as ‘reference.fits’ image (assuming it is in hdu
‘0’). You can download the FITS files in the *note Color channels in
same pixel grid:: section and use them as ‘input.fits’ and
‘reference.fits’ files. Feel free to change these names to your own
test file names. This can be useful when you have a complex grid and
WCS containing various keywords such as non-linear distortion
coefficients, etc. For example datasets, see the description of the
‘--gridfile’ option in *note Align pixels with WCS considering
distortions::.

To compile the demonstration program below, copy and paste the
contents in a plain-text file (let's assume you named it
‘align-to-img.c’) and use *note BuildProgram:: with this command:
'‘astbuildprog align-to-img.c’'. Please note that the demo program does
not perform many sanity checks to avoid making it too complex and to
highlight this particular feature in the library. For a robust method
write programs with all the necessary sanity checks, see Gnuastro's Warp
source code, see *note Program source::.

/* Warp to another image.
*
* In the example below, we are warping the input.fits file to the same
* pixel grid and WCS as reference.fits image.
*
* Copyright (C) 2024 Your Name <your@@email.address>
* Copyright (C) 2022-2024 Pedram Ashofteh-Ardakani <pedramardakani@pm.me>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

#include <stdio.h>
#include <stdlib.h>

#include <gnuastro/wcs.h> /* contains gnuastro's fits.h */
#include <gnuastro/warp.h> /* contains gnuastro's data.h */
#include <gnuastro/array.h> /* contains gnuastro's type.h */

int
main(void)
{
/* Input file's name and HDU. */
char *filename="input.fits", *hdu="1";

/* Reference file's name and HDU. */
char *gridfile="reference.fits", *gridhdu="0";

/* Output file name. */
char *outname="align-to-img.fits";

/* Low-level variables needed to read the reference file's size. */
int nwcs;
size_t ndim, *dsize;

/* Initialize the 'wa' struct with empty values and NULL pointers. */
gal_warp_wcsalign_t wa=gal_warp_wcsalign_template();

/* Read the input image and its WCS. */
wa.input=gal_array_read_one_ch_to_type(filename, hdu, NULL,
GAL_TYPE_FLOAT64, -1, 0, NULL);
wa.input->wcs=gal_wcs_read(filename, hdu, 0, 0, 0, &wa.input->nwcs,
NULL);

/* Prepare the warp input structure, use all threads available. */
wa.coveredfrac=1; wa.edgesampling=0; wa.numthreads=0;

/* Set the target grid to be the same as wcsref.fits file on hdu 0. */
wa.twcs=gal_wcs_read(gridfile, gridhdu, 0, 0, 0, &nwcs, NULL);
if(wa.twcs==NULL)
{
fprintf(stderr, "%s (hdu %s): no WCS! Can't continue\n",
gridfile, gridhdu);
exit(EXIT_FAILURE);
}

/* Read the output image size (from the reference image). Note that
* 'dsize' will be freed while freeing 'widthinpix'). */
dsize=gal_fits_img_info_dim(gridfile, gridhdu, &ndim, NULL);

/* Convert the 'dsize' to a 'gal_data_t' so the library can use it. */
wa.widthinpix=gal_data_alloc(dsize, GAL_TYPE_SIZE_T, 1, &ndim,
NULL, 1, -1, 0, NULL, NULL, NULL);

/* Do the warp, then convert the output to a 32-bit float (the default
* float64 is too much for observational data and just wastes
* storage!). But if you are warping mock data before adding noise
* (where you do have float64 level precision), remove the type
* conversion line. */
gal_warp_wcsalign(&wa);
wa.output=gal_data_copy_to_new_type_free(wa.output, GAL_TYPE_FLOAT32);

/* WARNING: make sure there is no file with same name as 'out.fits'
* or the result will be appended to its final HDU. */
gal_fits_img_write(wa.output, outname, NULL, 0);

/* Clean up. */
gal_data_free(wa.input);
gal_data_free(wa.output);
gal_data_free(wa.widthinpix);

/* Give control back to the operating system. */
return EXIT_SUCCESS;
}

File: gnuastro.info, Node: Library demo - Warp to new grid, Prev: Library demo - Warp to another image, Up: Library demo programs

12.4.6 Library demo - Warp to new grid
--------------------------------------

Gnuastro's warp library (that you can access by including
‘gnuastro/warp.h’) allows you to resample an image from a grid to
another entirely using the WCSLIB (while accounting for distortions if
necessary; see *note Warp library::). The Warp library uses a
pixel-mixing or area-based resampling approach which is fully described
in *note Resampling::. The most generic uses cases for this library are
already available in the *note Invoking astwarp:: program. For a
related demo (where the output grid and WCS are imported from another
file), see *note Library demo - Warp to another image::.

In the example below, we'll assume you have the SDSS image downloaded
in *note Downloading and validating input data::. After downloading the
image as described there, you will have ‘r.fits’ in your current
directory. We will therefore use ‘r.fits’ as the input to the rest
program here. The image is not aligned to the celestial coordinates, so
we will align the pixel and WCS coordinates, but set the center of the
pixel grid to be at (RA,Dec) of (202.4173735,47.3374525). We also give
it a ‘TAN’ projection with a pixel scale of 0.27 arcsecs, a defined
center pixel. However, we'll let the Warp library measure the proper
output image size that will contain the aligned image.

To compile the demonstration program below, copy and paste the
contents in a plain-text file (let's assume you named it
‘align-to-new.c’) and use *note BuildProgram:: with this command:
'‘astbuildprog align-to-new.c’'. Please note that the demo program does
not perform many sanity checks to avoid making it too complex and to
highlight this particular feature in the library. For a robust method
write programs with all the necessary sanity checks, see Gnuastro's Warp
source code, see *note Program source::.

/* Warp an image to a new grid.
*
* In the example below, We will use 'r.fits' as the input. The image is
* not aligned to the celestial coordinates, so we will align the pixel
* and WCS coordinates. We also give it a TAN projection. However, we’ll
* let the Warp library measure the proper output image size that will
* contain the aligned image.
*
* Copyright (C) 2024 Your Name <your@@email.address>
* Copyright (C) 2022-2024 Pedram Ashofteh-Ardakani <pedramardakani@pm.me>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/

#include <stdio.h>
#include <stdlib.h>

#include <gnuastro/wcs.h> /* Contains gnuastro's fits.h */
#include <gnuastro/warp.h> /* Contains gnuastro's data.h */
#include <gnuastro/array.h> /* Contains gnuastro's type.h */

int
main(void)
{
/* Input file's name and HDU. */
char *filename="r.fits", *hdu="0";

/* Output file name. */
char *outname="align-to-new.fits";

/* RA/Dec of the center of the central pixel of output. Please
* change the center based on your input. */
double center[]={202.4173735, 47.3374525};

/* Coordinate and Projection algorithms of output. */
char *ctype[2]={"RA---TAN", "DEC--TAN"};

/* Output pixel scale (in units of degrees/pixel). */
double cdelt[]={0.27/3600, 0.27/3600};

/* For intermediate steps. */
size_t two=2;

/* Initialize the 'wa' struct with empty values and NULL pointers. */
gal_warp_wcsalign_t wa=gal_warp_wcsalign_template();

/* Set the width (and height!) of the output in pixels (as a 1D and
* 2 element 'gal_data_t'). When it is NULL, the library will
* calculate the appropriate width to fully fit the input image
* after alignment. */
wa.widthinpix=NULL;

/* Set the number of threads to use. If the value is '0', the
* library will estimate the maximum available threads at
* run-time on the host operating system. */
wa.numthreads=0;

/* Prepare the warp input structure. */
wa.coveredfrac=1; wa.edgesampling=0;
wa.ctype=gal_data_alloc(ctype, GAL_TYPE_STRING, 1, &two, NULL, 1,
-1, 0, NULL, NULL, NULL);
wa.cdelt=gal_data_alloc(cdelt, GAL_TYPE_FLOAT64, 1, &two, NULL, 1,
-1, 0, NULL, NULL, NULL);
wa.center=gal_data_alloc(center, GAL_TYPE_FLOAT64, 1, &two, NULL, 1,
-1, 0, NULL, NULL, NULL);

/* Do the warp, then convert it to a 32-bit float. */
gal_warp_wcsalign(&wa);
wa.output=gal_data_copy_to_new_type_free(wa.output, GAL_TYPE_FLOAT32);

/* WARNING: make sure there is no file with same name as 'out.fits'
* or the result will be appended to its final HDU. */
gal_fits_img_write(wa.output, outname, NULL, 0);

/* Remove the pointers to arrays that we didn't allocate (and thus,
* should not be freed by 'gal_data_free' below). */
wa.cdelt->array=wa.center->array=wa.ctype->array=NULL;

/* Clean up. */
gal_data_free(wa.cdelt); gal_data_free(wa.ctype);
gal_data_free(wa.input); gal_data_free(wa.output);
gal_data_free(wa.center); gal_data_free(wa.widthinpix);

/* Give control back to the operating system. */
return EXIT_SUCCESS;
}

File: gnuastro.info, Node: Developing, Next: Other useful software, Prev: Library, Up: Top

13 Developing
*************

The basic idea of GNU Astronomy Utilities is for an interested
astronomer to be able to easily understand the code of any of the
programs or libraries, be able to modify the code if s/he feels there is
an improvement and finally, to be able to add new programs or libraries
for their own benefit, and the larger community if they are willing to
share it. In short, we hope that at least from the software point of
view, the "obscurantist faith in the expert's special skill and in his
personal knowledge and authority" can be broken, see *note Science and
its tools::. With this aim in mind, Gnuastro was designed to have a
very basic, simple, and easy to understand architecture for any
interested inquirer.

This chapter starts with very general design choices, in particular
*note Why C:: and *note Program design philosophy::. It will then get a
little more technical about the Gnuastro code and file/directory
structure in *note Coding conventions:: and *note Program source::.
*note The TEMPLATE program:: discusses a minimal (and working) template
to help in creating new programs or easier learning of a program's
internal structure. Some other general issues about documentation,
building and debugging are then discussed. This chapter concludes with
how you can learn about the development and get involved in *note
Gnuastro project webpage::, *note Developing mailing lists:: and *note
Contributing to Gnuastro::.

* Menu:

* Why C:: Why Gnuastro is designed in C.
* Program design philosophy:: General ideas behind the package structure.
* Coding conventions:: Gnuastro coding conventions.
* Program source:: Conventions for the code.
* Documentation:: Documentation is an integral part of Gnuastro.
* Building and debugging:: Build and possibly debug during development.
* Test scripts:: Understanding the test scripts.
* Bash programmable completion:: Auto-completions for better user experience.
* Developer's checklist:: Checklist to finalize your changes.
* Gnuastro project webpage:: Central hub for Gnuastro activities.
* Developing mailing lists:: Stay up to date with Gnuastro's development.
* Contributing to Gnuastro:: Share your changes with all users.

File: gnuastro.info, Node: Why C, Next: Program design philosophy, Prev: Developing, Up: Developing

13.1 Why C programming language?
================================

Currently the programming languages that are commonly used in scientific
applications are C++(1), Java(2); Python(3), and Julia(4) (which is a
newcomer but swiftly gaining ground). One of the main reasons behind
choosing these is their high-level abstractions. However, GNU Astronomy
Utilities is fully written in the C programming language(5). The
reasons can be summarized with simplicity, portability and
efficiency/speed. All four are very important in a scientific software
and we will discuss them below.

Simplicity can best be demonstrated in a comparison of the main books
of C++ and C. The "C programming language"(6) book, written by the
authors of C, is only 286 pages and covers a very good fraction of the
language, it has also remained unchanged from 1988. C is the main
programming language of nearly all operating systems and there is no
plan of any significant update. On the other hand, the most recent "C++
programming language"(7) book, also written by its author, has 1366
pages and its fourth edition came out in 2013! As discussed in *note
Science and its tools::, it is very important for other scientists to be
able to readily read the code of a program at their will with minimum
requirements.

In C++ or Java, inheritance in the object oriented programming
paradigm and their internal functions make the code very easy to write
for a programmer who is deeply invested in those objects and understands
all their relations well. But it simultaneously makes reading the
program for a first time reader (a curious scientist who wants to know
only how a small step was done) extremely hard. Before understanding
the methods, the scientist has to invest a lot of time and energy in
understanding those objects and their relations. But in C, everything
is done with basic language types for example ‘int’s or ‘float’s and
their pointers to define arrays. So when an outside reader is only
interested in one part of the program, that part is all they have to
understand.

Recently it is also becoming common to write scientific software in
Python, or a combination of it with C or C++. Python is a high level
scripting language which does not need compilation. It is very useful
when you want to do something on the go and do not want to be halted by
the troubles of compiling, linking, memory checking, etc. When the
datasets are small and the job is temporary, this ability of Python is
great and is highly encouraged. A very good example might be plotting,
in which Python is undoubtedly one of the best.

But as the data sets increase in size and the processing becomes more
complicated, the speed of Python scripts significantly decrease. So
when the program does not change too often and is widely used in a large
community, mostly on large data sets (like astronomical images), using
Python will waste a lot of valuable research-hours. It is possible to
wrap C or C++ functions with Python to fix the speed issue. But this
creates further complexity, because the interested scientist has to
master two programming languages and their connection (which is not
trivial).

Like C++, Python is object oriented, so as explained above, it needs
a high level of experience with that particular program to reasonably
understand its inner workings. To make things worse, since it is mainly
for on-the-go programming(8), it can undergo significant changes. One
recent example is how Python 2.x and Python 3.x are not compatible.
Lots of research teams that invested heavily in Python 2.x cannot
benefit from Python 3.x or future versions any more. Some converters
are available, but since they are automatic, lots of complications might
arise in the conversion(9). If a research project begins using Python
3.x today, there is no telling how compatible their investments will be
when Python 4.x or 5.x will come out.

Java is also fully object-oriented, but uses a different paradigm:
its compilation generates a hardware-independent _bytecode_, and a _Java
Virtual Machine_ (JVM) is required for the actual execution of this
bytecode on a computer. Java also evolved with time, and tried to
remain backward compatible, but inevitably this evolution required
discontinuities and replacements of a few Java components which were
first declared as becoming _deprecated_, and removed from later
versions.

This stems from the core principles of high-level languages like
Python or Java: that they evolve significantly on the scale of roughly 5
to 10 years. They are therefore useful when you want to solve a
short-term problem and you are ready to pay the high cost of keeping
your software up to date with all the changes in the language. This is
fine for private companies, but usually too expensive for scientific
projects that have limited funding for a fixed period. As a result, the
reproducibility of the result (ability to regenerate the result in the
future, which is a core principal of any scientific result) and
reusability of all the investments that went into the science software
will be lost to future generations! Rebuilding all the dependencies of
a software in an obsolete language is not easy, or even not possible.
Future-proof code (as long as current operating systems will be used) is
therefore written in C.

The portability of C is best demonstrated by the fact that C++, Java
and Python are part of the C-family of programming languages which also
include Julia, Perl, and many other languages. C libraries can be
immediately included in C++, and it is easy to write wrappers for them
in all C-family programming languages. This will allow other scientists
to benefit from C libraries using any C-family language that they
prefer. As a result, Gnuastro's library is already usable in C and C++,
and wrappers will be(10) added for higher-level languages like Python,
Julia and Java.

The final reason was speed. This is another very important aspect of
C which is not independent of simplicity (first reason discussed above).
The abstractions provided by the higher-level languages (which also
makes learning them harder for a newcomer) come at the cost of speed.
Since C is a low-level language(11) (closer to the hardware), it has a
direct access to the CPU(12), is generally considered as being faster in
its execution, and is much less complex for both the human reader _and_
the computer. The benefits of simplicity for a human were discussed
above. Simplicity for the computer translates into more efficient
(faster) programs. This creates a much closer relation between the
scientist/programmer (or their program) and the actual data and
processing. The GNU coding standards(13) also encourage the use of C
over all other languages when generality of usage and "high speed" is
desired.

---------- Footnotes ----------

(1) <https://isocpp.org/>

(2) <https://en.wikipedia.org/wiki/Java_(programming_language)>

(3) <https://www.python.org/>

(4) <https://julialang.org/>

(5) <https://en.wikipedia.org/wiki/C_(programming_language)>

(6) Brian Kernighan, Dennis Ritchie. _The C programming language_.
Prentice Hall, Inc., Second edition, 1988. It is also commonly known as
K&R and is based on the ANSI C and ISO C90 standards.

(7) Bjarne Stroustrup. _The C++ programming language_.
Addison-Wesley Professional; 4 edition, 2013.

(8) Note that Python is good for fast programming, not fast programs.

(9) For example see Jenness 2017 (https://arxiv.org/abs/1712.00461),
which describes how LSST is managing the transition.

(10) <http://savannah.gnu.org/task/?13786>

(11) Low-level languages are those that directly operate the hardware
like assembly languages. So C is actually a high-level language, but it
can be considered one of the lowest-level languages among all high-level
languages.

(12) for instance the _long double_ numbers with at least 64-bit
mantissa are not accessible in Python or Java.

(13) <http://www.gnu.org/prep/standards/>

File: gnuastro.info, Node: Program design philosophy, Next: Coding conventions, Prev: Why C, Up: Developing

13.2 Program design philosophy
==============================

The core processing functions of each program (and all libraries) are
written mostly with the basic ISO C90 standard. We do make lots of use
of the GNU additions to the C language in the GNU C library(1), but
these functions are mainly used in the user interface functions (reading
your inputs and preparing them prior to or after the analysis). The
actual algorithms, which most scientists would be more interested in,
are much more closer to ISO C90. For this reason, program source files
that deal with user interface issues and those doing the actual
processing are clearly separated, see *note Program source::. If
anything particular to the GNU C library is used in the processing
functions, it is explained in the comments in between the code.

All the Gnuastro programs provide very low level and modular
operations (modeled on GNU Coreutils). Almost all the basic
command-line programs like ‘ls’, ‘cp’ or ‘rm’ on GNU/Linux operating
systems are part of GNU Coreutils. This enables you to use shell
scripting languages (for example, GNU Bash) to operate on a large number
of files or do very complex things through the creative combinations of
these tools that the authors had never dreamed of. We have put a few
simple examples in *note Tutorials::.

For example, all the analysis output can be saved as ASCII tables
which can be fed into your favorite plotting program to inspect
visually. Python's Matplotlib is very useful for fast plotting of the
tables to immediately check your results. If you want to include the
plots in a document, you can use the PGFplots package within LaTeX, no
attempt is made to include such operations in Gnuastro. In short, Bash
can act as a glue to connect the inputs and outputs of all these various
Gnuastro programs (and other programs) in any fashion. Of course,
Gnuastro's programs are just front-ends to the main workhorse (*note
Gnuastro library::), allowing a user to create their own programs (for
example, with *note BuildProgram::). So once the functions within
programs become mature enough, they will be moved within the libraries
for even more general applications.

The advantage of this architecture is that the programs become small
and transparent: the starting and finishing point of every program is
clearly demarcated. For nearly all operations on a modern computer
(fast file input-output) with a modest level of complexity, the
read/write speed is insignificant compared to the actual processing a
program does. Therefore the complexity which arises from sharing memory
in a large application is simply not worth the speed gain. Gnuastro's
design is heavily influenced from Eric Raymond's "The Art of Unix
Programming"(2) which beautifully describes the design philosophy and
practice which lead to the success of Unix-based operating systems(3).

---------- Footnotes ----------

(1) Gnuastro uses many GNU additions to the C library. However,
thanks to the GNU Portability library (Gnulib) which is included in the
Gnuastro tarball, users of non-GNU/Linux operating systems can also
benefit from all these features when using Gnuastro.

(2) Eric S. Raymond, 2004, _The Art of Unix Programming_,
Addison-Wesley Professional Computing Series.

(3) KISS principle: Keep It Simple, Stupid!

File: gnuastro.info, Node: Coding conventions, Next: Program source, Prev: Program design philosophy, Up: Developing

13.3 Coding conventions
=======================

In Gnuastro, we try our best to follow the GNU coding standards. Added
to those, Gnuastro defines the following conventions. It is very
important for readability that the whole package follows the same
convention.

• The code must be easy to read by eye. So when the order of several
lines within a function does not matter (for example, when defining
variables at the start of a function). You should put the lines in
the order of increasing length and group the variables with similar
types such that this half-pyramid of declarations becomes most
visible. If the reader is interested, a simple search will show
them the variable they are interested in. However, this visual aid
greatly helps in general inspections of the code and help the
reader get a grip of the function's processing.

• A function that cannot be fully displayed (vertically) in your
monitor is probably too long and may be more useful if it is broken
up into multiple functions. 40 lines is usually a good reference.
When the start and end of a function are clearly visible in one
glance, the function is much more easier to understand. This is
most important for low-level functions (which usually define a lot
of variables). Low-level functions do most of the processing, they
will also be the most interesting part of a program for an
inquiring astronomer. This convention is less important for higher
level functions that do not define too many variables and whose
only purpose is to run the lower-level functions in a specific
order and with checks.

In general you can be very liberal in breaking up the functions
into smaller parts, the GNU Compiler Collection (GCC) will
automatically compile the functions as inline functions when the
optimizations are turned on. So you do not have to worry about
decreasing the speed. By default Gnuastro will compile with the
‘-O3’ optimization flag.

• All Gnuastro hand-written text files (C source code, Texinfo
documentation source, and version control commit messages) should
normally be no more than *75* characters per line. Monitors today
are certainly much wider, but with this limit, reading the
functions becomes much more easier. Also for the developers, it
allows multiple files (or multiple views of one file) to be
displayed beside each other on wide monitors.

Emacs's buffers are excellent for this capability, setting a buffer
width of 80 with '<C-u 80 C-x 3>' will allow you to view and work
on several files or different parts of one file using the wide
monitors common today. Emacs buffers can also be used as a shell
prompt and compile the program (with <M-x compile>), and 80
characters is the default width in most terminal emulators. If you
use Emacs, Gnuastro sets the 75 character ‘fill-column’ variable
automatically for you, see cartouche below.

For long comments you can use press <Alt-q> in Emacs to separate
them into separate lines automatically. For long literal strings,
you can use the fact that in C, two strings immediately after each
other are concatenated, for example, ‘"The first part, " "and the
second part."’. Note the space character in the end of the first
part. Since they are now separated, you can easily break a long
literal string into several lines and adhere to the maximum 75
character line length policy.

• The headers required by each source file (ending with ‘.c’) should
be defined inside of it. All the headers a complete program needs
should _not_ be stacked in another header to include in all source
files (for example ‘main.h’). Although most 'professional'
programmers choose this single header method, Gnuastro is primarily
written for professional/inquisitive astronomers (who are generally
amateur programmers). The list of header files included provides
valuable general information and helps the reader. ‘main.h’ may
only include the header file(s) that define types that the main
program structure needs, see ‘main.h’ in *note Program source::.
Those particular header files that are included in ‘main.h’ can of
course be ignored (not included) in separate source files.

• The headers should be classified (by an empty line) into separate
groups:

1. ‘#include <config.h>’: This must be the first code line (not
commented or blank) in each source file _within Gnuastro_. It
sets macros that the GNU Portability Library (Gnulib) will use
for a unified environment (GNU C Library), even when the user
is building on a system that does not use the GNU C library.

2. The C library header files, for example, ‘stdio.h’,
‘stdlib.h’, or ‘math.h’.
3. Installed library header files, including Gnuastro's installed
headers (for example ‘cfitsio.h’ or ‘gsl/gsl_rng.h’, or
‘gnuastro/fits.h’).
4. Gnuastro's internal headers (that are not installed), for
example ‘gnuastro-internal/options.h’.
5. For programs, the ‘main.h’ file (which is needed by the next
group of headers).
6. That particular program's header files, for example,
‘mkprof.h’, or ‘noisechisel.h’.

As much as order does not matter when you include the header of
each group, sort them by length, as described above.

• All function names, variables, etc., should be in lower case.
Macros and constant global ‘enum’s should be in upper case.

• For the naming of exported header files, functions, variables,
macros, and library functions, we adopt similar conventions to
those used by the GNU Scientific Library (GSL)(1). In particular,
in order to avoid clashes with the names of functions and variables
coming from other libraries the name-space '‘gal_’' is prefixed to
them. GAL stands for _G_NU _A_stronomy _L_ibrary.

• All installed header files should be in the ‘lib/gnuastro’
directory (under the top Gnuastro source directory). After
installation, they will be put in the ‘$prefix/include/gnuastro’
directory (see *note Installation directory:: for ‘$prefix’).
Therefore with this convention Gnuastro's headers can be included
in internal (to Gnuastro) and external (a library user) source
files with the same line
# include <gnuastro/headername.h>
Note that the GSL convention for header file names is
‘gsl_specialname.h’, so your include directive for a GSL header
must be something like ‘#include <gsl/gsl_specialname.h>’.
Gnuastro does not follow this GSL guideline because of the repeated
‘gsl’ in the include directive. It can be confusing and cause bugs
for beginners. All Gnuastro (and GSL) headers must be located
within a unique directory and will not be mixed with other headers.
Therefore the '‘gsl_’' prefix to the header file names is
redundant(2).

• All installed functions and variables should also include the
base-name of the file in which they are defined as prefix, using
underscores to separate words(3). The same applies to exported
macros, but in upper case. For example, in Gnuastro's top source
directory, the prototype of function ‘gal_box_border_from_center’
is in ‘lib/gnuastro/box.h’, and the macro ‘GAL_POLYGON_MAX_CORNERS’
is defined in ‘lib/gnuastro/polygon.h’.

This is necessary to give any user (who is not familiar with the
library structure) the ability to follow the code. This convention
does make the function names longer (a little harder to write), but
the extra documentation it provides plays an important role in
Gnuastro and is worth the cost.

• There should be no trailing white space in a line. To do this
automatically every time you save a file in Emacs, add the
following line to your ‘~/.emacs’ file.
(add-hook 'before-save-hook 'delete-trailing-whitespace)

• There should be no tabs in the indentation(4).

• Individual, contextually similar, functions in a source file are
separated by 5 blank lines to be easily seen to be related in a
group when parsing the source code by eye. In Emacs you can use
<CTRL-u 5 CTRL-o>.

• One group of contextually similar functions in a source file is
separated from another with 20 blank lines. In Emacs you can use
<CTRL-u 20 CTRL-o>. Each group of functions has short descriptive
title of the functions in that group. This title is surrounded by
asterisks (<*>) to make it clearly distinguishable. Such
contextual grouping and clear title are very important for easily
understanding the code.

• Always read the comments before the patch of code under it.
Similarly, try to add as many comments as you can regarding every
patch of code. Effectively, we want someone to get a good feeling
of the steps, without having to read the C code and only by reading
the comments. This follows similar principles as Literate
programming (https://en.wikipedia.org/wiki/Literate_programming).

The last two conventions are not common and might benefit from a
short discussion here. With a good experience in advanced text editor
operations, the last two are redundant for a professional developer.
However, recall that Gnuastro aspires to be friendly to unfamiliar, and
inexperienced (in programming) eyes. In other words, as discussed in
*note Science and its tools::, we want the code to appear welcoming to
someone who is completely new to coding (and text editors) and only has
a scientific curiosity.

Newcomers to coding and development, who are curious enough to
venture into the code, will probably not be using (or have any knowledge
of) advanced text editors. They will see the raw code in the web page
or on a simple text editor (like Gedit) as plain text. Trying to learn
and understand a file with dense functions that are all spaced with one
or two blank lines can be very taunting for a newcomer. But when they
scroll through the file and see clear titles and meaningful spaces for
similar functions, we are helping them find and focus on the part they
are most interested in sooner and easier.

*GNU Emacs, the recommended text editor:* GNU Emacs is an extensible and
easily customizable text editor which many programmers rely on for
developing due to its countless features. Among them, it allows
specification of certain settings that are applied to a single file or
to all files in a directory and its sub-directories. In order to
harmonize code coming from different contributors, Gnuastro comes with a
‘.dir-locals.el’ file which automatically configures Emacs to satisfy
most of the coding conventions above when you are using it within
Gnuastro's directories. Thus, Emacs users can readily start hacking
into Gnuastro. If you are new to developing, we strongly recommend this
editor. Emacs was the first project released by GNU and is still one of
its flagship projects. Some resources can be found at:

Official manual
At <https://www.gnu.org/software/emacs/manual/emacs.html>. This is
a great and very complete manual which is being improved for over
30 years and is the best starting point to learn it. It just
requires a little patience and practice, but rest assured that you
will be rewarded. If you install Emacs, you also have access to
this manual on the command-line with the following command (see
*note Info::).

$ info emacs

A guided tour of emacs
At <https://www.gnu.org/software/emacs/tour/>. A short visual tour
of Emacs, officially maintained by the Emacs developers.

Unofficial mini-manual
At <https://tuhdo.github.io/emacs-tutor.html>. A shorter manual
which contains nice animated images of using Emacs.

---------- Footnotes ----------

(1) <https://www.gnu.org/software/gsl/design/gsl-design.html#SEC15>

(2) For GSL, this prefix has an internal technical application: GSL's
architecture mixes installed and not-installed headers in the same
directory. This prefix is used to identify their installation status.
Therefore this filename prefix in GSL a technical internal issue (for
developers, not users).

(3) The convention to use underscores to separate words, called
"snake case" (or "snake_case"). This is also recommended by the GNU
coding standards.

(4) If you use Emacs, Gnuastro's ‘.dir-locals.el’ file will
automatically never use tabs for indentation. To make this a default in
all your Emacs sessions, you can add the following line to your
‘~/.emacs’ file: ‘(setq-default indent-tabs-mode nil)’

File: gnuastro.info, Node: Program source, Next: Documentation, Prev: Coding conventions, Up: Developing

13.4 Program source
===================

Besides the fact that all the programs share some functions that were
explained in *note Library::, everything else about each program is
completely independent. Recall that Gnuastro is written for an active
astronomer/scientist (not a passive one who just uses a software). It
must thus be easily navigable. Hence there are fixed source files (that
contain fixed operations) that must be present in all programs, these
are discussed fully in *note Mandatory source code files::. To easily
understand the explanations in this section you can use *note The
TEMPLATE program:: which contains the bare minimum code for one working
program. This template can also be used to easily add new utilities:
just copy and paste the directory and change ‘TEMPLATE’ with your
program's name.

* Menu:

* Mandatory source code files:: Description of files common to all programs.
* The TEMPLATE program:: Template for easy creation of a new program.

File: gnuastro.info, Node: Mandatory source code files, Next: The TEMPLATE program, Prev: Program source, Up: Program source

13.4.1 Mandatory source code files
----------------------------------

Some programs might need lots of source files and if there is no fixed
convention, navigating them can become very hard for a new inquirer into
the code. The following source files exist in every program's source
directory (which is located in ‘bin/progname’). For small programs,
these files are enough. Larger programs will need more files and
developers are encouraged to define any number of new files. It is just
important that the following list of files exist and do what is
described here. When creating other source files, please choose
filenames that are a complete single word: do not abbreviate
(abbreviations are cryptic). For a minimal program containing all these
files, see *note The TEMPLATE program::.

‘main.c’
Each executable has a ‘main’ function, which is located in
‘main.c’. Therefore this file is the starting point when reading
any program's source code. No actual processing functions must be
defined in this file, the function(s) in this file are only meant
to connect the most high level steps of each program. Generally,
‘main’ will first call the top user interface function to read user
input and make all the preparations. Then it will pass control to
the top processing function for that program. The functions to do
both these jobs must be defined in other source files.

‘main.h’
All the major parameters which will be used in the program must be
stored in a structure which is defined in ‘main.h’. The name of
this structure is usually ‘prognameparams’, for example,
‘cropparams’ or ‘noisechiselparams’. So ‘#include "main.h"’ will
be a staple in all the source codes of the program. It is also
regularly the first (and only) argument of many of the program's
functions which greatly helps in readability.

Keeping all the major parameters of a program in this structure has
the major benefit that most functions will only need one argument:
a pointer to this structure. This will significantly facilitate
the job of the programmer, the inquirer and the computer. All the
programs in Gnuastro are designed to be low-level, small and
independent parts, so this structure should not get too large.

The main root structure of all programs contains at least one
instance of the ‘gal_options_common_params’ structure. This
structure will keep the values to all common options in Gnuastro's
programs (see *note Common options::). This top root structure is
conveniently called ‘p’ (short for parameters) by all the functions
in the programs and the common options parameters within it are
called ‘cp’. With this convention any reader can immediately
understand where to look for the definition of one parameter. For
example, you know that ‘p->cp->output’ is in the common parameters
while ‘p->threshold’ is in the program's parameters.

With this basic root structure, the source code of functions can
potentially become full of structure de-reference operators (‘->’)
which can make the code very unreadable. In order to avoid this,
whenever a structure element is used more than a couple of times in
a function, a variable of the same type and with the same name (so
it can be searched) as the desired structure element should be
defined with the value of the root structure inside of it in
definition time. Here is an example:

char *hdu=p->cp.hdu;
float threshold=p->threshold;

‘args.h’
The options particular to each program are defined in this file.
Each option is defined by a block of parameters in
‘program_options’. These blocks are all you should modify in this
file, leave the bottom group of definitions untouched. These are
fed directly into the GNU C library's Argp facilities and it is
recommended to have a look at that for better understand what is
going on, although this is not required here.

Each element of the block defining an option is described under
‘argp_option’ in ‘bootstrapped/lib/argp.h’ (from Gnuastro's top
source file). Note that the last few elements of this structure
are Gnuastro additions (not documented in the standard Argp
manual). The values to these last elements are defined in
‘lib/gnuastro/type.h’ and ‘lib/gnuastro-internal/options.h’ (from
Gnuastro's top source directory).

‘ui.h’
Besides declaring the exported functions of ‘ui.c’, this header
also keeps the "key"s to every program-specific option. The first
class of keys for the options that have a short-option version
(single letter, see *note Options::). The character that is
defined here is the option's short option name. The list of
available alphabet characters can be seen in the comments. Recall
that some common options also take some characters, for those, see
‘lib/gnuastro-internal/options.h’.

The second group of options are those that do not have a short
option alternative. Only the first in this group needs a value
(‘1000’), the rest will be given a value by C's ‘enum’ definition,
so the actual value is irrelevant and must never be used, always
use the name.

‘ui.c’
Everything related to reading the user input arguments and options,
checking the configuration files and checking the consistency of
the input parameters before the actual processing is run should be
done in this file. Since most functions are the same, with only
the internal checks and structure parameters differing. We
recommend going through the ‘ui.c’ of *note The TEMPLATE program::,
or several other programs for a better understanding.

The most high-level function in ‘ui.c’ is named
‘ui_read_check_inputs_setup’. It accepts the raw command-line
inputs and a pointer to the root structure for that program (see
the explanation for ‘main.h’). This is the function that ‘main’
calls. The basic idea of the functions in this file is that the
processing functions should need a minimum number of such checks.
With this convention an inquirer who only wants to understand only
one part (mostly the processing part and not user input details and
sanity checks) of the code can easily do so in the later files. It
also makes all the errors related to input appear before the
processing begins which is more convenient for the user.

‘progname.c, progname.h’
The high-level processing functions in each program are in a file
named ‘progname.c’, for example, ‘crop.c’ or ‘noisechisel.c’. The
function within these files which ‘main’ calls is also named after
the program, for example:

void
crop(struct cropparams *p)

void
noisechisel(struct noisechiselparams *p)

In this manner, if an inquirer is interested in the processing
steps, they can immediately come and check this file for the first
processing step without having to go through ‘main.c’ and ‘ui.c’
first. In most situations, any failure in any step of the programs
will result in an informative error message and an immediate abort
in the program. So there is usually no need for return values.
Under more complicated situations where a return value might be
necessary, ‘void’ will be replaced with an ‘int’ in the examples
above. This value must be directly returned by ‘main’, so it has
to be an ‘int’.

‘authors-cite.h’
This header file keeps the global variable for the program authors
and its BibTeX record for citation. They are used in the outputs
of the common options ‘--version’ and ‘--cite’, see *note Operating
mode options::.

‘progname-complete.bash’
This shell script is used for implementing auto-completion features
when running Gnuastro's programs within GNU Bash. For more on the
concept of shell auto-completion and how it is managed in Gnuastro,
see *note Bash programmable completion::.

These files assume a set of common shell functions that have the
prefix ‘_gnuastro_autocomplete_’ in their name and are defined in
‘bin/complete.bash.in’ (of the source directory, and under version
control) and ‘bin/complete.bash.built’ (built during the building
of Gnuastro in the build directory). During Gnuastro's build, all
these Bash completion files are merged into one file that is
installed and the user can ‘source’ them into their Bash startup
file, for example, see *note Quick start::.

File: gnuastro.info, Node: The TEMPLATE program, Prev: Mandatory source code files, Up: Program source

13.4.2 The TEMPLATE program
---------------------------

The extra creativity offered by libraries comes at a cost: you have to
actually write your ‘main’ function and get your hands dirty in managing
user inputs: are all the necessary parameters given a value? is the
input in the correct format? do the options and the inputs correspond?
and many other similar checks. So when an operation has well-defined
inputs and outputs and is commonly needed, it is much more worthwhile to
simply do use all the great features that Gnuastro has already defined
for such operations.

To make it easier to learn/apply the internal program infrastructure
discussed in *note Mandatory source code files::, in the *note Version
controlled source::, Gnuastro ships with a template program. This
template program is not available in the Gnuastro tarball so it does not
confuse people using the tarball. The ‘bin/TEMPLATE’ directory in
Gnuastro's Git repository contains the bare minimum files necessary to
define a new program and all the basic/necessary files/functions are
pre-defined there.

Below you can see a list of initial steps to take for customizing
this template. We just assume that after cloning Gnuastro's history,
you have already bootstrapped Gnuastro, if not, please see *note
Bootstrapping::.

1. Select a name for your new program (for example, ‘myprog’).

2. Copy the ‘TEMPLATE’ directory to a directory with your program's
name:
$ cp -R bin/TEMPLATE bin/myprog

3. As with all source files in Gnuastro, all the files in template
also have a copyright notice at their top. Open all the files and
correct these notices: 1) The first line contains a single-line
description of the program. 2) In the second line only the name or
your program needs to be fixed and 3) Add your name and email as a
"Contributing author". As your program grows, you will need to add
new files, do not forget to add this notice in those new files too,
just put your name and email under "Original author" and correct
the copyright years.

4. Open ‘configure.ac’ in the top Gnuastro source. This file manages
the operations that are done when a user runs ‘./configure’. Going
down the file, you will notice repetitive parts for each program.
You will notice that the program names follow an alphabetic
ordering in each part. There is also a commented line/patch for
the ‘TEMPLATE’ program in each part. You can copy one line/patch
(from the program above or below your desired name for example) and
paste it in the proper place for your new program. Then correct
the names of the copied program to your new program name. There
are multiple places where this has to be done, so be patient and go
down to the bottom of the file. Ultimately add
‘bin/myprog/Makefile’ to ‘AC_CONFIG_FILES’, only here the ordering
depends on the length of the name (it is not alphabetical).

5. Open ‘Makefile.am’ in the top Gnuastro source. Similar to the
previous step, add your new program similar to all the other
programs. Here there are only two places: 1) at the top where we
define the conditionals (three lines per program), and 2)
immediately under it as part of the value for ‘SUBDIRS’.

6. Open ‘doc/Makefile.am’ and similar to ‘Makefile.am’ (above), add
the proper entries for the man page of your program to be created
(here, the variable that keeps all the man pages to be created is
‘dist_man_MANS’). Then scroll down and add a rule to build the man
page similar to the other existing rules (in alphabetical order).
Do not forget to add a short one-line description here, it will be
displayed on top of the man page.

7. Change ‘TEMPLATE.c’ and ‘TEMPLATE.h’ to ‘myprog.c’ and ‘myprog.h’
in the file names:

$ cd bin/myprog
$ mv TEMPLATE.c myprog.c
$ mv TEMPLATE.h myprog.h

8. Correct all occurrences of ‘TEMPLATE’ in the input files to
‘myprog’ (in short or long format). You can get a list of all
occurrences with the following command. If you use Emacs, it will
be able to parse the Grep output and open the proper file and line
automatically. So this step can be very easy.

$ grep --color -nHi -e template *

9. Run the following commands to rebuild the configuration and build
system, and then to configure and build Gnuastro (which now
includes your exciting new program).
$ autoreconf -f
$ ./configure
$ make

10. You are done! You can now start customizing your new program to
do your special processing. When it is complete, just do not
forget to add checks also, so it can be tested at least once on a
user's system with ‘make check’, see *note Test scripts::.
Finally, if you would like to share it with all Gnuastro users,
inform us so we merge it into Gnuastro's main history.

File: gnuastro.info, Node: Documentation, Next: Building and debugging, Prev: Program source, Up: Developing

13.5 Documentation
==================

Documentation (this book) is an integral part of Gnuastro (see *note
Science and its tools::). Documentation is not considered a separate
project and must be written by its developers. Users can make
edits/corrections, but the initial writing must be by the developer.
So, no change is considered valid for implementation unless the
respective parts of the book have also been updated. The following
procedure can be a good suggestion to take when you have a new idea and
are about to start implementing it.

The steps below are not a requirement, the important thing is that
when you send your work to be included in Gnuastro, the book and the
code have to both be fully up-to-date and compatible, with the purpose
of the update very clearly explained. You can follow any strategy you
like, the following strategy was what we have found to be most useful
until now.

1. Edit the book and fully explain your desired change, such that your
idea is completely embedded in the general context of the book with
no sense of discontinuity for a first time reader. This will allow
you to plan the idea much more accurately and in the general
context of Gnuastro (a particular program or library). Later on,
when you are coding, this general context will significantly help
you as a road-map.

A very important part of this process is the program/library
introduction. These first few paragraphs explain the purposes of
the program or library and are fundamental to Gnuastro. Before
actually starting to code, explain your idea's purpose thoroughly
in the start of the respective/new section you wish to work on.
While actually writing its purpose for a new reader, you will
probably get some valuable and interesting ideas that you had not
thought of before. This has occurred several times during the
creation of Gnuastro.

If an introduction already exists, embed or blend your idea's
purpose with the existing introduction. We emphasize that doing
this is equally useful for you (as the programmer) as it is useful
for the user (reader). Recall that the purpose of a program is
very important, see *note Program design philosophy::.

As you have already noticed for every program/library, it is very
important that the basics of the science and technique be explained
in separate subsections prior to the 'Invoking Programname'
subsection. If you are writing a new program or your addition to
an existing program involves a new concept, also include such
subsections and explain the concepts so a person completely
unfamiliar with the concepts can get a general initial
understanding. You do not have to go deep into the details, just
enough to get an interested person (with absolutely no background)
started with some good pointers/links to where they can continue
studying if they are more interested. If you feel you cannot do
that, then you have probably not understood the concept yourself.
If you feel you do not have the time, then think about yourself as
the reader in one year: you will forget almost all the details, so
now that you have done all the theoretical preparations, add a few
more hours and document it. Therefore in one year, when you find a
bug or want to add a new feature, you do not have to prepare as
much. Have in mind that your only limitation in length is the
fatigue of the reader after reading a long text, nothing else. So
as long as you keep it relevant/interesting for the reader, there
is no page number limit/cost.

It might also help if you start discussing the usage of your idea
in the 'Invoking ProgramName' subsection (explaining the options
and arguments you have in mind) at this stage too. Actually
starting to write it here will really help you later when you are
coding.

2. After you have finished adding your initial intended plan to the
book, then start coding your change or new program within the
Gnuastro source files. While you are coding, you will notice that
somethings should be different from what you wrote in the book
(your initial plan). So correct them as you are actually coding,
but do not worry too much about missing a few things (see the next
step).

3. After your work has been fully implemented, read the section
documentation from the start and check if you did not miss any
change in the coding. Also, ensure that the context is fairly
continuous for a first-time reader (who has not seen the book or
has known Gnuastro before you made your change).

4. If the change is notable, also update the ‘NEWS’ file.

File: gnuastro.info, Node: Building and debugging, Next: Test scripts, Prev: Documentation, Up: Developing

13.6 Building and debugging
===========================

To build the various programs and libraries in Gnuastro, the GNU build
system is used which defines the steps in *note Quick start::. It
consists of GNU Autoconf, GNU Automake and GNU Libtool which are
collectively known as GNU Autotools. They provide a very portable
system to check the hosts environment and compile Gnuastro based on
that. They also make installing everything in their standard places
very easy for the programmer. Most of the small caps files that you see
in the top source directory of the tarball are created by these three
tools (see *note Version controlled source::). To facilitate the
building and testing of your work during development, Gnuastro comes
with two useful scripts:

‘developer-build’
This is more fully described in *note Configure and build in RAM::.
During development, you will usually run this command only once (at
the start of your work).

‘tests/during-dev.sh’
This script is designed to be run each time you make a change and
want to test your work (with some possible input and output). The
script itself is heavily commented and thoroughly describes the
best way to use it, so we will not repeat it here. For a usage
example, see *note Forking tutorial::.

As a short summary: you specify the build directory, an output
directory (for the built program to be run in, and also contains
the inputs), the program's short name and the arguments and options
that it should be run with. This script will then build Gnuastro,
go to the output directory and run the built executable from there.
One option for the output directory might be your desktop, so you
can easily see the output files and delete them when you are
finished. The main purpose of these scripts is to keep your source
directory clean and facilitate your development.

By default all the programs are compiled with optimization flags for
increased speed. A side effect of optimization is that valuable
debugging information is lost. All the libraries are also linked as
shared libraries by default. Shared libraries further complicate the
debugging process and significantly slow down the compilation (the
‘make’ command). So during development it is recommended to configure
Gnuastro as follows:

$ ./configure --enable-debug

In ‘developer-build’ you can ask for this behavior through the ‘--debug’
option, see *note Separate build and source directories::.

In order to understand the building process, you can go through the
Autoconf, Automake and Libtool manuals, like all GNU manuals they
provide both a great tutorial and technical documentation. The "A small
Hello World" section in Automake's manual (in chapter 2) can be a good
starting guide after you have read the separate introductions.

File: gnuastro.info, Node: Test scripts, Next: Bash programmable completion, Prev: Building and debugging, Up: Developing

13.7 Test scripts
=================

As explained in *note Tests::, for every program some simple tests are
written to check the various independent features of the program. All
the tests are placed in the ‘tests/’ directory. The ‘tests/prepconf.sh’
script is the first 'test' that will be run. It will copy all the
configuration files from the various directories to a ‘tests/.gnuastro’
directory (which it will make) so the various tests can set the default
values. This script will also make sure the programs do not go
searching for user and system wide configuration files to avoid the
mixing of values with different Gnuastro version on the system.

For each program, the tests are placed inside directories with the
program name. Each test is written as a shell script. The last line of
this script is the test which runs the program with certain parameters.
The return value of this script determines the fate of the test, see the
"Support for test suites" chapter of the Automake manual for a very nice
and complete explanation. In every script, two variables are defined at
first: ‘prog’ and ‘execname’. The first specifies the program name and
the second the location of the executable.

The most important thing to have in mind about all the test scripts
is that they are run from inside the ‘tests/’ directory in the "build
tree". Which can be different from the directory they are stored in
(known as the "source tree")(1). This distinction is made by GNU
Autoconf and Automake (which configure, build and install Gnuastro) so
that you can install the program even if you do not have write access to
the directory keeping the source files. See the "Parallel build trees
(a.k.a VPATH builds)" in the Automake manual for a nice explanation.

Because of this, any necessary inputs that are distributed in the
tarball(2), for example, the catalogs necessary for checks in
MakeProfiles and Crop, must be identified with the ‘$topsrc’ prefix
instead of ‘../’ (for the top source directory that is unpacked). This
‘$topsrc’ variable points to the source tree where the script can find
the source data (it is defined in ‘tests/Makefile.am’). The executables
and other test products were built in the build tree (where they are
being run), so they do not need to be prefixed with that variable. This
is also true for images or files that were produced by other tests.

---------- Footnotes ----------

(1) The ‘developer-build’ script also uses this feature to keep the
source and build directories separate (see *note Separate build and
source directories::).

(2) In many cases, the inputs of a test are outputs of previous
tests, this does not apply to this class of inputs. Because all outputs
of previous tests are in the "build tree".

File: gnuastro.info, Node: Bash programmable completion, Next: Developer's checklist, Prev: Test scripts, Up: Developing

13.8 Bash programmable completion
=================================

*Under development:* While work on TAB completion is ongoing, it is not
yet fully ready, please see the notice at the start of *note Shell TAB
completion::.

Gnuastro provides Programmable completion facilities in Bash. This
greatly helps users reach their desired result with minimal keystrokes,
and helps them spend less time on figuring out the option names and
values their acceptable values. Gnuastro's completion script not only
completes the half-written commands, but also prints suggestions based
on previous arguments.

Imagine a scenario where we need to download three columns containing
the right ascension, declination, and parallax from the GAIA DR3
dataset. We have to make sure how these columns are abbreviated or
spelled. So we can call the command below, and store the column names
in a file such as ‘gaia-dr3-columns.txt’.

$ astquery gaia --information > gaia-dr3-columns.txt

Then we need to memorize or copy the column names of interest, and
specify an output fits file name such as ‘gaia.fits’:

$ astquery gaia --dataset=dr3 --output=gaia.fits \
--column=ra,dec,parallax

However, this is much easier using the auto-completion feature:

$ astquery gaia --dataset=dr3 --output=gaia.fits --column=<[TAB]>

After pressing <[TAB]>, a full list of gaia dr3 dataset column names
will be displayed. Typing the first key of the desired column and
pressing <[TAB]> again will limit the displayed list to only the
matching ones until the desired column is found.

* Menu:

* Bash TAB completion tutorial:: Fast tutorial to get you started on concepts.
* Implementing TAB completion in Gnuastro:: How Gnuastro uses Bash auto-completion features.

File: gnuastro.info, Node: Bash TAB completion tutorial, Next: Implementing TAB completion in Gnuastro, Prev: Bash programmable completion, Up: Bash programmable completion

13.8.1 Bash TAB completion tutorial
-----------------------------------

When a user presses the <[TAB]> key while typing commands, Bash will
inspect the input to find a relevant "completion specification", or
‘compspec’. If available, the ‘compspec’ will generate a list of
possible suggestions to complete the current word. A custom ‘compsec’
can be generated for any command using bash completion builtins(1) and
the bash variables that start with the ‘COMP’ keyword(2).

First, let's see a quick example of how you can make a completion
script in just one line of code. With the command below, we are asking
Bash to give us three suggestions for ‘echo’: ‘foo’, ‘bar’ and ‘bAr’.
Please run it in your terminal for the next steps.

$ complete -W "foo bar bAr" echo

The possible completion suggestions are fed into ‘complete’ using the
‘-W’ option followed by a list of space delimited words. Let's see it
in action:

$ echo <[TAB][TAB]>
bar bAr foo

Nicely done! Just note that the strings are sorted alphabetically,
not in the original order. Also, an arbitrary number of space
characters are printed between them (based on the number of suggestions
and terminal size, etc.). Now, if you type ‘f’ and press <[TAB]>, bash
will automatically figure out that you wanted ‘foo’ and it be completed
right away:

$ myprogram f<[TAB]>
$ myprogram foo

However, nothing will happen if you type ‘b’ and press <[TAB]> only
once. This is because of the ambiguity: there is not enough information
to figure out which suggestion you want: ‘bar’ or ‘bAr’? So, if you
press <[TAB]> twice, it will print out all the options that start with
‘b’:

$ echo b<[TAB][TAB]>
bar bAr
$ echo ba<[TAB]>
$ echo bar

Not bad for a simple program. But what if you need more control? By
passing the ‘-F’ option to ‘complete’ instead of ‘-W’, it will run a
function for generating the suggestions, instead of using a static
string. For example, let's assume that the expected value after ‘foo’
is the number of files in the current directory. Since the logic is
getting more complex, let's write and save the commands below into a
shell script with an arbitrary name such as ‘completion-tutorial.sh’:

$ cat completion-tutorial.sh
_echo(){
if [ "$3" == "foo" ]; then
COMPREPLY=( $(ls | wc -l) )
else
COMPREPLY=( $(compgen -W "foo bar bAr" -- "$2") )
fi
}
complete -F _echo echo

We will look at it in detail soon. But for now, let's ‘source’ the file
into your current terminal and check if it works as expected:

$ source completion-tutorial.sh
$ echo <[TAB][TAB]>
foo bar bAr
$ echo foo <[TAB]>
$ touch empty.txt
$ echo foo <[TAB]>

Success! As you see, this allows for setting up highly customized
completion scripts. Now let's have a closer look at the
‘completion-tutorial.sh’ completion script from above. First, the ‘-F’
option in front the ‘complete’ command indicates that we want shell to
execute the ‘_echo’ function whenever ‘echo’ is called. As a
convention, the function name should be the same as the program name,
but prefixed with an underscore (‘_’).

Within the ‘_echo’ function, we're checking if ‘$3’ is equal to
‘foo’. In Bash's auto-complete, ‘$3’ means the word before current
cursor position. In fact, these are the arguments that the ‘_echo’
function is receiving:

‘$1’
The name of the command, here it is ‘echo’.

‘$2’
The current word being completed (empty unless we are in the middle
of typing a word).

‘$3’
The word before the word being completed.

To tell the completion script what to reply with, we use the
‘COMPREPLY’ array. This array holds all the suggestions that ‘complete’
will show for the user in the end. In the example above, we simply give
it the string output of ‘ls | wc -l’.

Finally, we have the ‘compgen’ command. According to bash
programmable completion builtins manual, the command ‘compgen [OPTION]
[WORD]’ generates possible completion matches for ‘[WORD]’ according to
‘[OPTIONS]’. Using the ‘-W’ option asks ‘compgen’ to generate a list of
words from an input string. This is known as Word Splitting(3).
‘compgen’ will automatically use the ‘$IFS’ variable to split the string
into a list of words. You can check the default delimiters by calling:

$ printf %q "$IFS"

The default value of ‘$IFS’ might be ‘ \t\n’. This means the SPACE,
TAB, and New-line characters. Finally, notice the ‘-- "$2"’ in this
command:

COMPREPLY=( $(compgen -W "foo bar bAr" -- "$2") )

Here, the ‘--’ instructs ‘compgen’ to only reply with a list of words
that match ‘$2’, i.e. the current word being completed. That is why
when you type the letter ‘b’, ‘complete’ will reply only with its
matches (‘bar’ and ‘bAr’), and will exclude ‘foo’.

Let's get a little more realistic, and develop a very basic
completion script for one of Gnuastro's programs. Since the ‘--help’
option will list all the options available in Gnuastro's programs, we
are going to use its output and create a very basic TAB completion for
it. Note that the actual TAB completion in Gnuastro is a little more
complex than this and fully described in *note Implementing TAB
completion in Gnuastro::. But this is a good exercise to get started.

We will use ‘asttable’ as the demo, and the goal is to suggest all
options that this program has to offer. You can print all of them (with
a lot of extra information) with this command:

$ asttable --help

Let's write an ‘awk’ script that prints all of the long options.
When printing the option names we can safely ignore the short options
because if a user knows about the short options, s/he already knows
exactly what they want! Also, due to their single-character length,
they will be too cryptic without their descriptions.

One way to catch the long options is through ‘awk’ as shown below.
We only keep the lines that 1) starting with an empty space, 2) their
first no-white character is ‘-’ and that have the format of ‘--’
followed by any number of numbers or characters. Within those lines, if
the first word ends in a comma (‘,’), the first word is the short
option, so we want the second word (which is the long option).
Otherwise, the first word is the long option. But for options that take
a value, this will also include the format of the value (for example,
‘--column=STR’). So with a ‘sed’ command, we remove everything that is
after the equal sign, but keep the equal sign itself (to highlight to
the user that this option should have a value).

$ asttable --help \
| awk '/^ / && $1 ~ /^-/ && /--+[a-zA-Z0-9]*/ { \
if($1 ~ /,$/) name=$2; \
else name=$1; \
print name}' \
| sed -e's|=.*|=|'

If we wanted to show all the options to the user, we could simply
feed the values of the command above to ‘compgen’ and ‘COMPREPLY’
subsequently. But, we need _smarter_ completions: we want to offer
suggestions based on the previous options that have already been typed
in. Just Beware! Sometimes the program might not be acting as you
expected. In that case, using debug messages can clear things up. You
can add a ‘echo’ command before the completion function ends, and check
all current variables. This can save a lot of headaches, since things
can get complex.

Take the option ‘--wcsfile=’ for example. This option accepts a FITS
file. Usually, the user is trying to feed a FITS file from the current
directory. So it would be nice if we could help them and print only a
list of FITS files sitting in the current directory - or whatever
directory they have typed-in so far.

But there's a catch. When splitting the user's input line, Bash will
consider ‘=’ as a separate word. To avoid getting caught in changing
the ‘IFS’ or ‘WORDBREAKS’ values, we will simply check for ‘=’ and act
accordingly. That is, if the previous word is a ‘=’, we will ignore it
and take the word before that as the previous word. Also, if the
current word is a ‘=’, ignore it completely. Taking all of that into
consideration, the code below might serve well:

_asttable(){
if [ "$2" = "=" ]; then word=""
else word="$2"
fi

if [ "$3" = "=" ]; then prev="${COMP_WORDS[COMP_CWORD-2]}"
else prev="${COMP_WORDS[COMP_CWORD-1]}"
fi

case "$prev" in
--wcsfile)
COMPREPLY=( $(compgen -f -X "!*.[fF][iI][tT][sS]" -- "$word") )
;;
esac
}
complete -o nospace -F _asttable asttable

To test the code above, write it into ‘asttable-tutorial.sh’, and load
it into your running terminal with this command:

$ source asttable-tutorial.sh

If you then go to a directory that has at least one FITS file (with a
‘.fits’ suffix, among other files), you can checkout the function by
typing the following command. You will see that only files ending in
‘.fits’ are shown, not any other file.

asttable --wcsfile=[TAB][TAB]

The code above first identifies the current and previous words. It
then checks if the previous word is equal to ‘--wcsfile’ and if so,
fills ‘COMPREPLY’ array with the necessary suggestions. We are using
‘case’ here (instead of ‘if’) because in a real scenario, we need to
check many more values and ‘case’ is far better suited for such cases
(cleaner and more efficient code).

The ‘-f’ option in ‘compgen’ indicates we're looking for a file. The
‘-X’ option _filters out_ the filenames that match the next regular
expression pattern. Therefore we should start the regular expression
with ‘!’ if we want the files matching the regular expression. The ‘--
"$word"’ component collects only filenames that match the current word
being typed. And last but not least, the ‘-o nospace’ option in the
‘complete’ command instructs the completion script to _not_ append a
white space after each suggestion. That is important because the long
format of an option, its value is more clear when it sticks to the
option name with a ‘=’ sign.

You have now written a very basic and working TAB completion script
that can easily be generalized to include more options (and be good for
a single/simple program). However, Gnuastro has many programs that
share many similar things and the options are not independent. Also,
complex situations do often come up: for example, some people use a
‘.fit’ suffix for FITS files and others do not even use a suffix at all!
So in practice, things need to get a little more complicated, but the
core concept is what you learnt in this section. We just modularize the
process (breaking logically independent steps into separate functions to
use in different situations). In *note Implementing TAB completion in
Gnuastro::, we will review the generalities of Gnuastro's implementation
of Bash TAB completion.

---------- Footnotes ----------

(1)
<https://www.gnu.org/software/bash/manual/html_node/Programmable-Completion-Builtins.html>

(2)
<https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html>

(3)
<https://www.gnu.org/software/bash/manual/html_node/Word-Splitting.html>

File: gnuastro.info, Node: Implementing TAB completion in Gnuastro, Prev: Bash TAB completion tutorial, Up: Bash programmable completion

13.8.2 Implementing TAB completion in Gnuastro
----------------------------------------------

The basics of Bash auto-completion was reviewed in *note Bash TAB
completion tutorial::. Gnuastro is a very complex package of many
programs, that have many similar features, so implementing those
principles in an easy to maintain manner requires a modular solution.
As a result, Bash's TAB completion is implemented as multiple files in
Gnuastro:

‘bin/completion.bash.built’ (in build directory, automatically created)
This file contains the values of all Gnuastro options or arguments
that take fixed strings as values (not file names). For example,
the names of Arithmetic's operators (see *note Arithmetic
operators::), or spectral line names (like ‘--obsline’ in *note
CosmicCalculator input options::).

This file is created automatically during the building of Gnuastro.
The recipe to build it is available in Gnuastro's top-level
‘Makefile.am’ (under the target ‘bin/completion.bash’). It parses
the respective Gnuastro source file that contains the necessary
user-specified strings. All the acceptable values values are then
stored as shell variables (within a function).

‘bin/completion.bash.in’ (in source directory, under version control)
All the low-level completion functions that are common to all
programs are stored here. It thus contains functions that will
parse the command-line or files, or suggest the completion replies.

‘PROGNAME-complete.bash’ (in source directory, under version control)
All Gnuastro programs contain a ‘PROGNAME-complete.bash’ script
within their source (for more on the fixed files of each program,
see *note Program source::). This file contains the very
high-level (program-specific) Bash programmable completion features
that are almost always defined in Gnuastro-generic Bash completion
file (‘bin/completion.bash.in’).

The top-level function that is called by Bash should be called
‘_gnuastro_autocomplete_PROGNAME’ and its last line should be the
‘complete’ command of Bash which calls this function. The contents
of ‘_gnuastro_autocomplete_PROGNAME’ are almost identical for all
the programs, it is just a very high-level function that either
calls ‘_gnuastro_autocomplete_PROGNAME_arguments’ to manage
suggestions for the program's arguments or
‘_gnuastro_autocomplete_PROGNAME_option_value’ to manage
suggestions for the program's option values.

The scripts above follow the following conventions. After reviewing the
list, please also look into the functions for examples of each point.
• No global shell variables in any completion script: the contents of
the files above are directly loaded into the user's environment.
So to keep the user's environment clean and avoid annoyance to the
users, everything should be defined as shell functions, and any
variable within the functions should be set as ‘local’.

• All the function names should start with
'‘_gnuastro_autocomplete_’', again to avoid populating the user's
function name-space with possibly conflicting names.

• Outputs of functions should be written in the ‘local’ variables of
the higher-level functions that called them.

File: gnuastro.info, Node: Developer's checklist, Next: Gnuastro project webpage, Prev: Bash programmable completion, Up: Developing

13.9 Developer's checklist
==========================

This is a checklist of things to do after applying your
changes/additions in Gnuastro:

1. If the change is non-trivial, write test(s) in the
‘tests/progname/’ directory to test the change(s)/addition(s) you
have made. Then add their file names to ‘tests/Makefile.am’.

2. If your change involves a change in command-line behavior of a
Gnuastro program or script (for example, adding a new option or
argument), create or update the respective
‘bin/PROGNAME/completion.sh’ file described under the *note Bash
programmable completion:: section.

3. Run ‘$ make check’ to make sure everything is working correctly.

4. Make sure the documentation (this book) is completely up to date
with your changes, see *note Documentation::.

5. Commit the change to your issue branch (see *note Production
workflow:: and *note Forking tutorial::). Afterwards, run
Autoreconf to generate the appropriate version number:

$ autoreconf -f

6. Finally, to make sure everything will be built, installed and
checked correctly run the following command (after re-configuring,
and rebuilding). To greatly speed up the process, use multiple
threads (8 in the example below, change it appropriately)

$ make distcheck -j8

This command will create a distribution file (ending with
‘.tar.gz’) and try to compile it in the most general cases, then it
will run the tests on what it has built in its own
mini-environment. If ‘$ make distcheck’ finishes successfully,
then you are safe to send your changes to us to implement or for
your own purposes. See *note Production workflow:: and *note
Forking tutorial::.

File: gnuastro.info, Node: Gnuastro project webpage, Next: Developing mailing lists, Prev: Developer's checklist, Up: Developing

13.10 Gnuastro project webpage
==============================

Gnuastro's central management hub
(https://savannah.gnu.org/projects/gnuastro/)(1) is located on GNU
Savannah (https://savannah.gnu.org/)(2). Savannah is the central
software development management system for many GNU projects. Through
this central hub, you can view the list of activities that the
developers are engaged in, their activity on the version controlled
source, and other things. Each defined activity in the development
cycle is known as an 'issue' (or 'item'). An issue can be a bug (see
*note Report a bug::), or a suggested feature (see *note Suggest new
feature::) or an enhancement or generally any _one_ job that is to be
done. In Savannah, issues are classified into three categories or
'tracker's:

Support
This tracker is a way that (possibly anonymous) users can get in
touch with the Gnuastro developers. It is a complement to the
bug-gnuastro mailing list (see *note Report a bug::). Anyone can
post an issue to this tracker. The developers will not submit an
issue to this list. They will only reassign the issues in this
list to the other two trackers if they are valid(3). Ideally (when
the developers have time to put on Gnuastro, please do not forget
that Gnuastro is a volunteer effort), there should be no open items
in this tracker.

Bugs
This tracker contains all the known bugs in Gnuastro (problems with
the existing tools).

Tasks
The items in this tracker contain the future plans (or new
features/capabilities) that are to be added to Gnuastro.

All the trackers can be browsed by a (possibly anonymous) visitor, but
to edit and comment on the Bugs and Tasks trackers, you have to be a
registered on Savannah. When posting an issue to a tracker, it is very
important to choose the 'Category' and 'Item Group' options accurately.
The first contains a list of all Gnuastro's programs along with
'Installation', 'New program' and 'Webpage'. The "Item Group" contains
the nature of the issue, for example, if it is a 'Crash' in the software
(a bug), or a problem in the documentation (also a bug) or a feature
request or an enhancement.

The set of horizontal links on the top of the page (Starting with
'Main' and 'Homepage' and finishing with 'News') are the easiest way to
access these trackers (and other major aspects of the project) from any
part of the project web page. Hovering your mouse over them will open a
drop down menu that will link you to the different things you can do on
each tracker (for example, 'Submit new' or 'Browse'). When you browse
each tracker, you can use the "Display Criteria" link above the list to
limit the displayed issues to what you are interested in. The
'Category' and 'Group Item' (explained above) are a good starting point.

Any new issue that is submitted to any of the trackers, or any
comments that are posted for an issue, is directly forwarded to the
gnuastro-devel mailing list
(<https://lists.gnu.org/mailman/listinfo/gnuastro-devel>, see *note
Developing mailing lists:: for more). This will allow anyone interested
to be up to date on the over-all development activity in Gnuastro and
will also provide an alternative (to Savannah) archiving for the
development discussions. Therefore, it is not recommended to directly
post an email to this mailing list, but do all the activities (for
example add new issues, or comment on existing ones) on Savannah.

*Do I need to be a member in Savannah to contribute to Gnuastro?* No.

The full version controlled history of Gnuastro is available for
anonymous download or cloning. See *note Production workflow:: for a
description of Gnuastro's Integration-Manager Workflow. In short, you
can either send in patches, or make your own fork. If you choose the
latter, you can push your changes to your own fork and inform us. We
will then pull your changes and merge them into the main project.
Please see *note Forking tutorial:: for a tutorial.

---------- Footnotes ----------

(1) <https://savannah.gnu.org/projects/gnuastro/>

(2) <https://savannah.gnu.org/>

(3) Some of the issues registered here might be due to a mistake on
the user's side, not an actual bug in the program.

File: gnuastro.info, Node: Developing mailing lists, Next: Contributing to Gnuastro, Prev: Gnuastro project webpage, Up: Developing

13.11 Developing mailing lists
==============================

To keep the developers and interested users up to date with the activity
and discussions within Gnuastro, there are two mailing lists which you
can subscribe to:

‘gnuastro-devel@gnu.org’
(at <https://lists.gnu.org/mailman/listinfo/gnuastro-devel>)

All the posts made in the support, bugs and tasks discussions of
*note Gnuastro project webpage:: are also sent to this mailing
address and archived. By subscribing to this list you can stay up
to date with the discussions that are going on between the
developers before, during and (possibly) after working on an issue.
All discussions are either in the context of bugs or tasks which
are done on Savannah and circulated to all interested people
through this mailing list. Therefore it is not recommended to post
anything directly to this mailing list. Any mail that is sent to
it from Savannah to this list has a link under the title "Reply to
this item at:". That link will take you directly to the issue
discussion page, where you can read the discussion history or join
it.

While you are posting comments on the Savannah issues, be sure to
update the meta-data. For example, if the task/bug is not assigned
to anyone and you would like to take it, change the "Assigned to"
box, or if you want to report that it has been applied, change the
status and so on. All these changes will also be circulated with
the email very clearly.

‘gnuastro-commits@gnu.org’
(at <https://lists.gnu.org/mailman/listinfo/gnuastro-commits>)

This mailing list is defined to circulate all commits that are done
in Gnuastro's version controlled source, see *note Version
controlled source::. If you have any ideas, or suggestions on the
commits, please use the bug and task trackers on Savannah to
followup the discussion, do not post to this list. All the commits
that are made for an already defined issue or task will state the
respective ID so you can find it easily.

File: gnuastro.info, Node: Contributing to Gnuastro, Prev: Developing mailing lists, Up: Developing

13.12 Contributing to Gnuastro
==============================

You have this great idea or have found a good fix to a problem which you
would like to implement in Gnuastro. You have also become familiar with
the general design of Gnuastro in the previous sections of this chapter
(see *note Developing::) and want to start working on and sharing your
new addition/change with the whole community as part of the official
release. This is great and your contribution is most welcome. This
section and the next (see *note Developer's checklist::) are written in
the hope of making it as easy as possible for you to share your great
idea with the community.

In this section we discuss the final steps you have to take: legal
and technical. From the legal perspective, the copyright of any work
you do on Gnuastro has to be assigned to the Free Software Foundation
(FSF) and the GNU operating system, or you have to sign a disclaimer.
We do this to ensure that Gnuastro can remain free in the future, see
*note Copyright assignment::. From the technical point of view, in this
section we also discuss commit guidelines (*note Commit guidelines::)
and the general version control workflow of Gnuastro in *note Production
workflow::, along with a tutorial in *note Forking tutorial::.

Recall that before starting the work on your idea, be sure to
checkout the bugs and tasks trackers in *note Gnuastro project webpage::
and announce your work there so you do not end up spending time on
something others have already worked on, and also to attract similarly
interested developers to help you.

* Menu:

* Copyright assignment:: Copyright has to be assigned to the FSF.
* Commit guidelines:: Guidelines for commit messages.
* Production workflow:: Submitting your commits (work) for inclusion.
* Forking tutorial:: Tutorial on workflow steps with Git.

File: gnuastro.info, Node: Copyright assignment, Next: Commit guidelines, Prev: Contributing to Gnuastro, Up: Contributing to Gnuastro

13.12.1 Copyright assignment
----------------------------

Gnuastro's copyright is owned by the Free Software Foundation (FSF) to
ensure that Gnuastro always remains free. The FSF has also provided a
Contributor FAQ (https://www.fsf.org/licensing/contributor-faq) to
further clarify the reasons, so we encourage you to read it. Professor
Eben Moglen, of the Columbia University Law School has given a nice
summary of the reasons for this at
<https://www.gnu.org/licenses/why-assign>. Below we are copying it
verbatim for self consistency (in case you are offline or reading in
print).

Under US copyright law, which is the law under which most free
software programs have historically been first published, there are
very substantial procedural advantages to registration of
copyright. And despite the broad right of distribution conveyed by
the GPL, enforcement of copyright is generally not possible for
distributors: only the copyright holder or someone having
assignment of the copyright can enforce the license. If there are
multiple authors of a copyrighted work, successful enforcement
depends on having the cooperation of all authors.

In order to make sure that all of our copyrights can meet the
record keeping and other requirements of registration, and in order
to be able to enforce the GPL most effectively, FSF requires that
each author of code incorporated in FSF projects provide a
copyright assignment, and, where appropriate, a disclaimer of any
work-for-hire ownership claims by the programmer's employer. That
way we can be sure that all the code in FSF projects is free code,
whose freedom we can most effectively protect, and therefore on
which other developers can completely rely.

Please get in touch with the Gnuastro maintainer (currently Mohammad
Akhlaghi, mohammad -at- akhlaghi -dot- org) to follow the procedures.
It is possible to do this for each change (good for a single
contribution), and also more generally for all the changes/additions you
do in the future within Gnuastro. So if you have already assigned the
copyright of your work on another GNU software to the FSF, it should be
done again for Gnuastro. The FSF has staff working on these legal
issues and the maintainer will get you in touch with them to do the
paperwork. The maintainer will just be informed in the end so your
contributions can be merged within the Gnuastro source code.

Gnuastro will gratefully acknowledge (see *note Acknowledgments::)
all the people who have assigned their copyright to the FSF and have
thus helped to guarantee the freedom and reliability of Gnuastro. The
Free Software Foundation will also acknowledge your copyright
contributions in the Free Software Supporter:
<https://www.fsf.org/free-software-supporter> which will circulate to a
very large community (225,910 people in July 2021). See the archives
for some examples and subscribe to receive interesting updates. The
very active code contributors (or developers) will also be recognized as
project members on the Gnuastro project web page (see *note Gnuastro
project webpage::) and can be given a ‘gnu.org’ email address. So your
very valuable contribution and copyright assignment will not be
forgotten and is highly appreciated by a very large community. If you
are reluctant to sign an assignment, a disclaimer is also acceptable.

*Do I need a disclaimer from my university or employer?* It depends on
the contract with your university or employer. From the FSF's
‘/gd/gnuorg/conditions.text’: "If you are employed to do programming, or
have made an agreement with your employer that says it owns programs you
write, we need a signed piece of paper from your employer disclaiming
rights to" Gnuastro. The FSF's copyright clerk will kindly help you
decide, please consult the following email address: "assign -at- gnu
-dot- org".

File: gnuastro.info, Node: Commit guidelines, Next: Production workflow, Prev: Copyright assignment, Up: Contributing to Gnuastro

13.12.2 Commit guidelines
-------------------------

To be able to cleanly integrate your work with the other developers,
*never commit on the ‘master’ branch* (see *note Production workflow::
for a complete discussion and *note Forking tutorial:: for a cookbook
example). In short, leave ‘master’ only for changes you fetch, or pull
from the official repository (see *note Synchronizing::).

In the Gnuastro commit messages, we strive to follow these standards.
Note that in the early phases of Gnuastro's development, we are
experimenting and so if you notice earlier commits do not satisfy some
of the guidelines below, it is because they predate that guideline.

Commit title
The commits have to start with one short descriptive title. The
title is separated from the body with one blank line. Run ‘git
log’ to see some of the most recent commit messages as an example.
In general, the title should satisfy the following conditions:

• It is best for the title to be short, about 60 (or even 50)
characters. Most emulated command-line terminals are about 80
characters wide. However, we should also allow for the commit
hashes which are printed in ‘git log --oneline’, and also
branch names or the graph structure outputs of ‘git log’ which
are also commonly used.

• The title should not finish with any full-stops or periods
('<.>').

Commit body
The body of the commit message is separated from the title by one
empty line. Recall that anyone who has subscribed to
‘gnuastro-commits’ mailing list will get the commit in their email
after it has been pushed to ‘master’. People will also read them
when they synchronize with the main Gnuastro repository (see *note
Synchronizing::). Finally, the commit messages will later be used
to update the ‘NEWS’ file on each release. Therefore the commit
message body plays a very important role in the development of
Gnuastro, so please adhere to the following guidelines.

• The body should be very descriptive. Start the commit message
body by explaining what changes your commit makes from a
user's perspective (added, changed, or removed options, or
arguments to programs or libraries, or modified algorithms, or
new installation step, etc.).

• Try to explain the committed contents as best as you can.
Recall that the readers of your commit message do not
necessarily have your current background. After some time you
will also forget the context, so this request is not just for
others(1). Therefore be very descriptive and explain as much
as possible: what the bug/task was, justify the way you fixed
it and discuss other possible solutions that you might not
have included. For the last item, it is best to discuss them
thoroughly as comments in the appropriate section of the code,
but only give a short summary in the commit message. Note
that all added and removed source code lines will also be
circulated in the ‘gnuastro-commits’ mailing list.

• Like all other Gnuastro's text files, the lines in the commit
body should not be longer than 75 characters, see *note Coding
conventions::. This is to ensure that on standard terminal
emulators (with 80 character width), the ‘git log’ output can
be cleanly displayed (note that the commit message is indented
in the output of ‘git log’). If you use Emacs, Gnuastro's
‘.dir-locals.el’ file will ensure that your commits satisfy
this condition (using <M-q>).

• When the commit is related to a task or a bug, please include
the respective ID (in the format of ‘bug/task #ID’, note the
space) in the commit message (from *note Gnuastro project
webpage::) for interested people to be able to followup the
discussion that took place there. If the commit fixes a bug
or finishes a task, the recommended way is to add a line after
the body with '‘This fixes bug #ID.’', or '‘This finishes task
#ID.’'. Do not assume that the reader has internet access to
check the bug's full description when reading the commit
message, so give a short introduction too.

Below you can see a good commit message example (do not forget to
read it, it has tips for you). After reading this, please run ‘git log’
on the ‘master’ branch and read some of the recent commits for more
realistic examples.

The first line should be the title of the commit

An empty line is necessary after the title so Git does not confuse
lines. This top paragraph of the body of the commit usually describes
the reason this commit was done. Therefore it usually starts with
"Until now ...". It is very useful to explain the reason behind the
change, things that are not immediately obvious when looking into the
code. You do not need to list the names of the files, or what lines
have been changed, do not forget that the code changes are fully stored
within Git :-).

In the second paragraph (or any later paragraph!) of the body, we
describe the solution and why (not "how"!) the particular solution was
implemented. So we usually start this part of the commit body with
"With this commit ...". Again, you do not need to go into the details
that can be seen from the 'git diff' command (like the file names that
have been changed or the code that has been implemented). The important
thing here is the things that are not immediately obvious from looking
into the code.

You can continue the explanation and it is encouraged to be very
explicit about the "human factor" of the change as much as possible, not
technical details.

---------- Footnotes ----------

(1) <http://catb.org/esr/writings/unix-koans/prodigy.html>