Discussion:
POSIX Safety
Carlos O'Donell
2014-06-27 13:45:25 UTC
Permalink
Michael,

I submit the following text to the Linux Kernel Man Pages project.
The goal being that we copy-edit this into a safety attributes
man page and thus harmonize the definition of thread safe,
async-cancel safe, and async-signal safe between glibc and the
linux kernel man page project.

Please feel free to use all, some, or non of this document. It is
included under GPLv2+_DOC_FULL for your use in the linux kernel man
pages project. It is presently formatted as info, please feel free
to reformat. For example the HURD parts of the doucment do not apply
since the man pages are intended for systems using the Linux
kernel e.g. GNU/Linux.

As always I look forward to continued harmonization between the
glibc manual and linux kernel man pages project :-)

Cheers,
Carlos.

---
=2E\" Copyright (c) 2014, Red Hat, Inc.
=2E\"
=2E\" %%%LICENSE_START(GPLv2+_DOC_FULL)
=2E\" This is free documentation; you can redistribute it and/or
=2E\" modify it under the terms of the GNU General Public License as
=2E\" published by the Free Software Foundation; either version 2 of
=2E\" the License, or (at your option) any later version.
=2E\"
=2E\" The GNU General Public License's references to "object code"
=2E\" and "executables" are to be interpreted as the output of any
=2E\" document formatting or typesetting system, including
=2E\" intermediate and printed output.
=2E\"
=2E\" This manual is distributed in the hope that it will be useful,
=2E\" but WITHOUT ANY WARRANTY; without even the implied warranty of
=2E\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
=2E\" GNU General Public License for more details.
=2E\"
=2E\" You should have received a copy of the GNU General Public
=2E\" License along with this manual; if not, see
=2E\" <http://www.gnu.org/licenses/>.
=2E\" %%%LICENSE_END

@node POSIX Safety Concepts, Unsafe Features, , POSIX
@subsubsection POSIX Safety Concepts
@cindex POSIX Safety Concepts

This manual documents various safety properties of @glibcadj{}
functions, in lines that follow their prototypes and look like:

@sampsafety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}

The properties are assessed according to the criteria set forth in the
POSIX standard for such safety contexts as Thread-, Async-Signal- and
Async-Cancel- -Safety. Intuitive definitions of these properties,
attempting to capture the meaning of the standard definitions, follow.

@itemize @bullet

@item
@cindex MT-Safe
@cindex Thread-Safe
@code{MT-Safe} or Thread-Safe functions are safe to call in the presenc=
e
of other threads. MT, in MT-Safe, stands for Multi Thread.

Being MT-Safe does not imply a function is atomic, nor that it uses any
of the memory synchronization mechanisms POSIX exposes to users. It is
even possible that calling MT-Safe functions in sequence does not yield
an MT-Safe combination. For example, having a thread call two MT-Safe
functions one right after the other does not guarantee behavior
equivalent to atomic execution of a combination of both functions, sinc=
e
concurrent calls in other threads may interfere in a destructive way.

Whole-program optimizations that could inline functions across library
interfaces may expose unsafe reordering, and so performing inlining
across the @glibcadj{} interface is not recommended. The documented
MT-Safety status is not guaranteed under whole-program optimization.
However, functions defined in user-visible headers are designed to be
safe for inlining.

@item
@cindex AS-Safe
@cindex Async-Signal-Safe
@code{AS-Safe} or Async-Signal-Safe functions are safe to call from
asynchronous signal handlers. AS, in AS-Safe, stands for Asynchronous
Signal.

Many functions that are AS-Safe may set @code{errno}, or modify the
floating-point environment, because their doing so does not make them
unsuitable for use in signal handlers. However, programs could
misbehave should asynchronous signal handlers modify this thread-local
state, and the signal handling machinery cannot be counted on to
preserve it. Therefore, signal handlers that call functions that may
set @code{errno} or modify the floating-point environment @emph{must}
save their original values, and restore them before returning.

@item
@cindex AC-Safe
@cindex Async-Cancel-Safe
@code{AC-Safe} or Async-Cancel-Safe functions are safe to call when
asynchronous cancellation is enabled. AC in AC-Safe stands for
Asynchronous Cancellation.

The POSIX standard defines only three functions to be AC-Safe, namely
@code{pthread_cancel}, @code{pthread_setcancelstate}, and
@code{pthread_setcanceltype}. At present @theglibc{} provides no
guarantees beyond these three functions, but does document which
functions are presently AC-Safe. This documentation is provided for us=
e
by @theglibc{} developers.

Just like signal handlers, cancellation cleanup routines must configure
the floating point environment they require. The routines cannot assum=
e
a floating point environment, particularly when asynchronous
cancellation is enabled. If the configuration of the floating point
environment cannot be performed atomically then it is also possible tha=
t
the environment encountered is internally inconsistent.

@item
@cindex MT-Unsafe
@cindex Thread-Unsafe
@cindex AS-Unsafe
@cindex Async-Signal-Unsafe
@cindex AC-Unsafe
@cindex Async-Cancel-Unsafe
@code{MT-Unsafe}, @code{AS-Unsafe}, @code{AC-Unsafe} functions are not
safe to call within the safety contexts described above. Calling them
within such contexts invokes undefined behavior.

=46unctions not explicitly documented as safe in a safety context shoul=
d
be regarded as Unsafe.

@item
@cindex Preliminary
@code{Preliminary} safety properties are documented, indicating these
properties may @emph{not} be counted on in future releases of
@theglibc{}.

Such preliminary properties are the result of an assessment of the
properties of our current implementation, rather than of what is
mandated and permitted by current and future standards.

Although we strive to abide by the standards, in some cases our
implementation is safe even when the standard does not demand safety,
and in other cases our implementation does not meet the standard safety
requirements. The latter are most likely bugs; the former, when marked
as @code{Preliminary}, should not be counted on: future standards may
require changes that are not compatible with the additional safety
properties afforded by the current implementation.

=46urthermore, the POSIX standard does not offer a detailed definition =
of
safety. We assume that, by ``safe to call'', POSIX means that, as long
as the program does not invoke undefined behavior, the ``safe to call''
function behaves as specified, and does not cause other functions to
deviate from their specified behavior. We have chosen to use its loose
definitions of safety, not because they are the best definitions to use=
,
but because choosing them harmonizes this manual with POSIX.

Please keep in mind that these are preliminary definitions and
annotations, and certain aspects of the definitions are still under
discussion and might be subject to clarification or change.

Over time, we envision evolving the preliminary safety notes into stabl=
e
commitments, as stable as those of our interfaces. As we do, we will
remove the @code{Preliminary} keyword from safety notes. As long as th=
e
keyword remains, however, they are not to be regarded as a promise of
future behavior.

@end itemize

Other keywords that appear in safety notes are defined in subsequent
sections.

@node Unsafe Features, Conditionally Safe Features, POSIX Safety Concep=
ts, POSIX
@subsubsection Unsafe Features
@cindex Unsafe Features

=46unctions that are unsafe to call in certain contexts are annotated w=
ith
keywords that document their features that make them unsafe to call.
AS-Unsafe features in this section indicate the functions are never saf=
e
to call when asynchronous signals are enabled. AC-Unsafe features
indicate they are never safe to call when asynchronous cancellation is
enabled. There are no MT-Unsafe marks in this section.

@itemize @bullet

@item @code{lock}
@cindex lock

=46unctions marked with @code{lock} as an AS-Unsafe feature may be
interrupted by a signal while holding a non-recursive lock. If the
signal handler calls another such function that takes the same lock, th=
e
result is a deadlock.

=46unctions annotated with @code{lock} as an AC-Unsafe feature may, if
cancelled asynchronously, fail to release a lock that would have been
released if their execution had not been interrupted by asynchronous
thread cancellation. Once a lock is left taken, attempts to take that
lock will block indefinitely.

@item @code{corrupt}
@cindex corrupt

=46unctions marked with @code{corrupt} as an AS-Unsafe feature may corr=
upt
data structures and misbehave when they interrupt, or are interrupted
by, another such function. Unlike functions marked with @code{lock},
these take recursive locks to avoid MT-Safety problems, but this is not
enough to stop a signal handler from observing a partially-updated data
structure. Further corruption may arise from the interrupted function'=
s
failure to notice updates made by signal handlers.

=46unctions marked with @code{corrupt} as an AC-Unsafe feature may leav=
e
data structures in a corrupt, partially updated state. Subsequent uses
of the data structure may misbehave.

@c A special case, probably not worth documenting separately, involves
@c reallocing, or even freeing pointers. Any case involving free could
@c be easily turned into an ac-safe leak by resetting the pointer befor=
e
@c releasing it; I don't think we have any case that calls for this sor=
t
@c of fixing. Fixing the realloc cases would require a new interface:
@c instead of @code{ptr=3Drealloc(ptr,size)} we'd have to introduce
@c @code{acsafe_realloc(&ptr,size)} that would modify ptr before
@c releasing the old memory. The ac-unsafe realloc could be implemente=
d
@c in terms of an internal interface with this semantics (say
@c __acsafe_realloc), but since realloc can be overridden, the function
@c we call to implement realloc should not be this internal interface,
@c but another internal interface that calls __acsafe_realloc if reallo=
c
@c was not overridden, and calls the overridden realloc with async
@c cancel disabled. --lxoliva

@item @code{heap}
@cindex heap

=46unctions marked with @code{heap} may call heap memory management
functions from the @code{malloc}/@code{free} family of functions and ar=
e
only as safe as those functions. This note is thus equivalent to:

@sampsafety{@asunsafe{@asulock{}}@acunsafe{@aculock{} @acsfd{} @acsmem{=
}}}

@c Check for cases that should have used plugin instead of or in
@c addition to this. Then, after rechecking gettext, adjust i18n if
@c needed.
@item @code{dlopen}
@cindex dlopen

=46unctions marked with @code{dlopen} use the dynamic loader to load
shared libraries into the current execution image. This involves
opening files, mapping them into memory, allocating additional memory,
resolving symbols, applying relocations and more, all of this while
holding internal dynamic loader locks.

The locks are enough for these functions to be AS- and AC-Unsafe, but
other issues may arise. At present this is a placeholder for all
potential safety issues raised by @code{dlopen}.

@c dlopen runs init and fini sections of the module; does this mean
@c dlopen always implies plugin?

@item @code{plugin}
@cindex plugin

=46unctions annotated with @code{plugin} may run code from plugins that
may be external to @theglibc{}. Such plugin functions are assumed to b=
e
MT-Safe, AS-Unsafe and AC-Unsafe. Examples of such plugins are stack
@cindex NSS
unwinding libraries, name service switch (NSS) and character set
@cindex iconv
conversion (iconv) back-ends.

Although the plugins mentioned as examples are all brought in by means
of dlopen, the @code{plugin} keyword does not imply any direct
involvement of the dynamic loader or the @code{libdl} interfaces, those
are covered by @code{dlopen}. For example, if one function loads a
module and finds the addresses of some of its functions, while another
just calls those already-resolved functions, the former will be marked
with @code{dlopen}, whereas the latter will get the @code{plugin}. Whe=
n
a single function takes all of these actions, then it gets both marks.

@item @code{i18n}
@cindex i18n

=46unctions marked with @code{i18n} may call internationalization
functions of the @code{gettext} family and will be only as safe as thos=
e
functions. This note is thus equivalent to:

@sampsafety{@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @ascu=
dlopen{}}@acunsafe{@acucorrupt{}}}

@item @code{timer}
@cindex timer

=46unctions marked with @code{timer} use the @code{alarm} function or
similar to set a time-out for a system call or a long-running operation=
=2E
In a multi-threaded program, there is a risk that the time-out signal
will be delivered to a different thread, thus failing to interrupt the
intended thread. Besides being MT-Unsafe, such functions are always
AS-Unsafe, because calling them in signal handlers may interfere with
timers set in the interrupted code, and AC-Unsafe, because there is no
safe way to guarantee an earlier timer will be reset in case of
asynchronous cancellation.

@end itemize

@node Conditionally Safe Features, Other Safety Remarks, Unsafe Feature=
s, POSIX
@subsubsection Conditionally Safe Features
@cindex Conditionally Safe Features

=46or some features that make functions unsafe to call in certain
contexts, there are known ways to avoid the safety problem other than
refraining from calling the function altogether. The keywords that
follow refer to such features, and each of their definitions indicate
how the whole program needs to be constrained in order to remove the
safety problem indicated by the keyword. Only when all the reasons tha=
t
make a function unsafe are observed and addressed, by applying the
documented constraints, does the function become safe to call in a
context.

@itemize @bullet

@item @code{init}
@cindex init

=46unctions marked with @code{init} as an MT-Unsafe feature perform
MT-Unsafe initialization when they are first called.

Calling such a function at least once in single-threaded mode removes
this specific cause for the function to be regarded as MT-Unsafe. If n=
o
other cause for that remains, the function can then be safely called
after other threads are started.

=46unctions marked with @code{init} as an AS- or AC-Unsafe feature use =
the
internal @code{libc_once} machinery or similar to initialize internal
data structures.

If a signal handler interrupts such an initializer, and calls any
function that also performs @code{libc_once} initialization, it will
deadlock if the thread library has been loaded.

=46urthermore, if an initializer is partially complete before it is
canceled or interrupted by a signal whose handler requires the same
initialization, some or all of the initialization may be performed more
than once, leaking resources or even resulting in corrupt internal data=
=2E

Applications that need to call functions marked with @code{init} as an
AS- or AC-Unsafe feature should ensure the initialization is performed
before configuring signal handlers or enabling cancellation, so that th=
e
AS- and AC-Safety issues related with @code{libc_once} do not arise.

@c We may have to extend the annotations to cover conditions in which
@c initialization may or may not occur, since an initial call in a safe
@c context is no use if the initialization doesn't take place at that
@c time: it doesn't remove the risk for later calls.

@item @code{race}
@cindex race

=46unctions annotated with @code{race} as an MT-Safety issue operate on
objects in ways that may cause data races or similar forms of
destructive interference out of concurrent execution. In some cases,
the objects are passed to the functions by users; in others, they are
used by the functions to return values to users; in others, they are no=
t
even exposed to users.

We consider access to objects passed as (indirect) arguments to
functions to be data race free. The assurance of data race free object=
s
is the caller's responsibility. We will not mark a function as
MT-Unsafe or AS-Unsafe if it misbehaves when users fail to take the
measures required by POSIX to avoid data races when dealing with such
objects. As a general rule, if a function is documented as reading fro=
m
an object passed (by reference) to it, or modifying it, users ought to
use memory synchronization primitives to avoid data races just as they
would should they perform the accesses themselves rather than by callin=
g
the library function. @code{FILE} streams are the exception to the
general rule, in that POSIX mandates the library to guard against data
races in many functions that manipulate objects of this specific opaque
type. We regard this as a convenience provided to users, rather than a=
s
a general requirement whose expectations should extend to other types.

In order to remind users that guarding certain arguments is their
responsibility, we will annotate functions that take objects of certain
types as arguments. We draw the line for objects passed by users as
follows: objects whose types are exposed to users, and that users are
expected to access directly, such as memory buffers, strings, and
various user-visible @code{struct} types, do @emph{not} give reason for
functions to be annotated with @code{race}. It would be noisy and
redundant with the general requirement, and not many would be surprised
by the library's lack of internal guards when accessing objects that ca=
n
be accessed directly by users.

As for objects that are opaque or opaque-like, in that they are to be
manipulated only by passing them to library functions (e.g.,
@code{FILE}, @code{DIR}, @code{obstack}, @code{iconv_t}), there might b=
e
additional expectations as to internal coordination of access by the
library. We will annotate, with @code{race} followed by a colon and th=
e
argument name, functions that take such objects but that do not take
care of synchronizing access to them by default. For example,
@code{FILE} stream @code{unlocked} functions will be annotated, but
those that perform implicit locking on @code{FILE} streams by default
will not, even though the implicit locking may be disabled on a
per-stream basis.

In either case, we will not regard as MT-Unsafe functions that may
access user-supplied objects in unsafe ways should users fail to ensure
the accesses are well defined. The notion prevails that users are
expected to safeguard against data races any user-supplied objects that
the library accesses on their behalf.

@c The above describes @mtsrace; @mtasurace is described below.

This user responsibility does not apply, however, to objects controlled
by the library itself, such as internal objects and static buffers used
to return values from certain calls. When the library doesn't guard
them against concurrent uses, these cases are regarded as MT-Unsafe and
AS-Unsafe (although the @code{race} mark under AS-Unsafe will be omitte=
d
as redundant with the one under MT-Unsafe). As in the case of
user-exposed objects, the mark may be followed by a colon and an
identifier. The identifier groups all functions that operate on a
certain unguarded object; users may avoid the MT-Safety issues related
with unguarded concurrent access to such internal objects by creating a
non-recursive mutex related with the identifier, and always holding the
mutex when calling any function marked as racy on that identifier, as
they would have to should the identifier be an object under user
control. The non-recursive mutex avoids the MT-Safety issue, but it
trades one AS-Safety issue for another, so use in asynchronous signals
remains undefined.

When the identifier relates to a static buffer used to hold return
values, the mutex must be held for as long as the buffer remains in use
by the caller. Many functions that return pointers to static buffers
offer reentrant variants that store return values in caller-supplied
buffers instead. In some cases, such as @code{tmpname}, the variant is
chosen not by calling an alternate entry point, but by passing a
non-@code{NULL} pointer to the buffer in which the returned values are
to be stored. These variants are generally preferable in multi-threade=
d
programs, although some of them are not MT-Safe because of other
internal buffers, also documented with @code{race} notes.

@item @code{const}
@cindex const

=46unctions marked with @code{const} as an MT-Safety issue non-atomical=
ly
modify internal objects that are better regarded as constant, because a
substantial portion of @theglibc{} accesses them without
synchronization. Unlike @code{race}, that causes both readers and
writers of internal objects to be regarded as MT-Unsafe and AS-Unsafe,
this mark is applied to writers only. Writers remain equally MT- and
AS-Unsafe to call, but the then-mandatory constness of objects they
modify enables readers to be regarded as MT-Safe and AS-Safe (as long a=
s
no other reasons for them to be unsafe remain), since the lack of
synchronization is not a problem when the objects are effectively
constant.

The identifier that follows the @code{const} mark will appear by itself
as a safety note in readers. Programs that wish to work around this
safety issue, so as to call writers, may use a non-recursve
@code{rwlock} associated with the identifier, and guard @emph{all} call=
s
to functions marked with @code{const} followed by the identifier with a
write lock, and @emph{all} calls to functions marked with the identifie=
r
by itself with a read lock. The non-recursive locking removes the
MT-Safety problem, but it trades one AS-Safety problem for another, so
use in asynchronous signals remains undefined.

@c But what if, instead of marking modifiers with const:id and readers
@c with just id, we marked writers with race:id and readers with ro:id?
@c Instead of having to define each instance of =93id=94, we'd have a
@c general pattern governing all such =93id=94s, wherein race:id would
@c suggest the need for an exclusive/write lock to make the function
@c safe, whereas ro:id would indicate =93id=94 is expected to be read-o=
nly,
@c but if any modifiers are called (while holding an exclusive lock),
@c then ro:id-marked functions ought to be guarded with a read lock for
@c safe operation. ro:env or ro:locale, for example, seems to convey
@c more clearly the expectations and the meaning, than just env or
@c locale.

@item @code{sig}
@cindex sig

=46unctions marked with @code{sig} as a MT-Safety issue (that implies a=
n
identical AS-Safety issue, omitted for brevity) may temporarily install
a signal handler for internal purposes, which may interfere with other
uses of the signal, identified after a colon.

This safety problem can be worked around by ensuring that no other uses
of the signal will take place for the duration of the call. Holding a
non-recursive mutex while calling all functions that use the same
temporary signal; blocking that signal before the call and resetting it=
s
handler afterwards is recommended.

There is no safe way to guarantee the original signal handler is
restored in case of asynchronous cancellation, therefore so-marked
functions are also AC-Unsafe.

@c fixme: at least deferred cancellation should get it right, and would
@c obviate the restoring bit below, and the qualifier above.

Besides the measures recommended to work around the MT- and AS-Safety
problem, in order to avert the cancellation problem, disabling
asynchronous cancellation @emph{and} installing a cleanup handler to
restore the signal to the desired state and to release the mutex are
recommended.

@item @code{term}
@cindex term

=46unctions marked with @code{term} as an MT-Safety issue may change th=
e
terminal settings in the recommended way, namely: call @code{tcgetattr}=
,
modify some flags, and then call @code{tcsetattr}; this creates a windo=
w
in which changes made by other threads are lost. Thus, functions marke=
d
with @code{term} are MT-Unsafe. The same window enables changes made b=
y
asynchronous signals to be lost. These functions are also AS-Unsafe,
but the corresponding mark is omitted as redundant.

It is thus advisable for applications using the terminal to avoid
concurrent and reentrant interactions with it, by not using it in signa=
l
handlers or blocking signals that might use it, and holding a lock whil=
e
calling these functions and interacting with the terminal. This lock
should also be used for mutual exclusion with functions marked with
@code{@mtasurace{:tcattr(fd)}}, where @var{fd} is a file descriptor for
the controlling terminal. The caller may use a single mutex for
simplicity, or use one mutex per terminal, even if referenced by
different file descriptors.

=46unctions marked with @code{term} as an AC-Safety issue are supposed =
to
restore terminal settings to their original state, after temporarily
changing them, but they may fail to do so if cancelled.

@c fixme: at least deferred cancellation should get it right, and would
@c obviate the restoring bit below, and the qualifier above.

Besides the measures recommended to work around the MT- and AS-Safety
problem, in order to avert the cancellation problem, disabling
asynchronous cancellation @emph{and} installing a cleanup handler to
restore the terminal settings to the original state and to release the
mutex are recommended.

@end itemize

@node Other Safety Remarks, , Conditionally Safe Features, POSIX
@subsubsection Other Safety Remarks
@cindex Other Safety Remarks

Additional keywords may be attached to functions, indicating features
that do not make a function unsafe to call, but that may need to be
taken into account in certain classes of programs:

@itemize @bullet

@item @code{locale}
@cindex locale

=46unctions annotated with @code{locale} as an MT-Safety issue read fro=
m
the locale object without any form of synchronization. Functions
annotated with @code{locale} called concurrently with locale changes ma=
y
behave in ways that do not correspond to any of the locales active
during their execution, but an unpredictable mix thereof.

We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the locale object are marked with
@code{const:locale} and regarded as unsafe. Being unsafe, the latter
are not to be called when multiple threads are running or asynchronous
signals are enabled, and so the locale can be considered effectively
constant in these contexts, which makes the former safe.

@c Should the locking strategy suggested under @code{const} be used,
@c failure to guard locale uses is not as fatal as data races in
@c general: unguarded uses will @emph{not} follow dangling pointers or
@c access uninitialized, unmapped or recycled memory. Each access will
@c read from a consistent locale object that is or was active at some
@c point during its execution. Without synchronization, however, it
@c cannot even be assumed that, after a change in locale, earlier
@c locales will no longer be used, even after the newly-chosen one is
@c used in the thread. Nevertheless, even though unguarded reads from
@c the locale will not violate type safety, functions that access the
@c locale multiple times may invoke all sorts of undefined behavior
@c because of the unexpected locale changes.

@item @code{env}
@cindex env

=46unctions marked with @code{env} as an MT-Safety issue access the
environment with @code{getenv} or similar, without any guards to ensure
safety in the presence of concurrent modifications.

We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the environment are all marked with
@code{const:env} and regarded as unsafe. Being unsafe, the latter are
not to be called when multiple threads are running or asynchronous
signals are enabled, and so the environment can be considered
effectively constant in these contexts, which makes the former safe.

@item @code{hostid}
@cindex hostid

The function marked with @code{hostid} as an MT-Safety issue reads from
the system-wide data structures that hold the ``host ID'' of the
machine. These data structures cannot generally be modified atomically=
=2E
Since it is expected that the ``host ID'' will not normally change, the
function that reads from it (@code{gethostid}) is regarded as safe,
whereas the function that modifies it (@code{sethostid}) is marked with
@code{@mtasuconst{:@mtshostid{}}}, indicating it may require special
care if it is to be called. In this specific case, the special care
amounts to system-wide (not merely intra-process) coordination.

@item @code{sigintr}
@cindex sigintr

=46unctions marked with @code{sigintr} as an MT-Safety issue access the
@code{_sigintr} internal data structure without any guards to ensure
safety in the presence of concurrent modifications.

We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the this data structure are all marked with
@code{const:sigintr} and regarded as unsafe. Being unsafe, the latter
are not to be called when multiple threads are running or asynchronous
signals are enabled, and so the data structure can be considered
effectively constant in these contexts, which makes the former safe.

@item @code{fd}
@cindex fd

=46unctions annotated with @code{fd} as an AC-Safety issue may leak fil=
e
descriptors if asynchronous thread cancellation interrupts their
execution.

=46unctions that allocate or deallocate file descriptors will generally=
be
marked as such. Even if they attempted to protect the file descriptor
allocation and deallocation with cleanup regions, allocating a new
descriptor and storing its number where the cleanup region could releas=
e
it cannot be performed as a single atomic operation. Similarly,
releasing the descriptor and taking it out of the data structure
normally responsible for releasing it cannot be performed atomically.
There will always be a window in which the descriptor cannot be release=
d
because it was not stored in the cleanup handler argument yet, or it wa=
s
already taken out before releasing it. It cannot be taken out after
release: an open descriptor could mean either that the descriptor still
has to be closed, or that it already did so but the descriptor was
reallocated by another thread or signal handler.

Such leaks could be internally avoided, with some performance penalty,
by temporarily disabling asynchronous thread cancellation. However,
since callers of allocation or deallocation functions would have to do
this themselves, to avoid the same sort of leak in their own layer, it
makes more sense for the library to assume they are taking care of it
than to impose a performance penalty that is redundant when the problem
is solved in upper layers, and insufficient when it is not.

This remark by itself does not cause a function to be regarded as
AC-Unsafe. However, cumulative effects of such leaks may pose a
problem for some programs. If this is the case, suspending asynchronou=
s
cancellation for the duration of calls to such functions is recommended=
=2E

@item @code{mem}
@cindex mem

=46unctions annotated with @code{mem} as an AC-Safety issue may leak
memory if asynchronous thread cancellation interrupts their execution.

The problem is similar to that of file descriptors: there is no atomic
interface to allocate memory and store its address in the argument to a
cleanup handler, or to release it and remove its address from that
argument, without at least temporarily disabling asynchronous
cancellation, which these functions do not do.

This remark does not by itself cause a function to be regarded as
generally AC-Unsafe. However, cumulative effects of such leaks may be
severe enough for some programs that disabling asynchronous cancellatio=
n
for the duration of calls to such functions may be required.

@item @code{cwd}
@cindex cwd

=46unctions marked with @code{cwd} as an MT-Safety issue may temporaril=
y
change the current working directory during their execution, which may
cause relative pathnames to be resolved in unexpected ways in other
threads or within asynchronous signal or cancellation handlers.

This is not enough of a reason to mark so-marked functions as MT- or
AS-Unsafe, but when this behavior is optional (e.g., @code{nftw} with
@code{FTW_CHDIR}), avoiding the option may be a good alternative to
using full pathnames or file descriptor-relative (e.g. @code{openat})
system calls.

@item @code{!posix}
@cindex !posix

This remark, as an MT-, AS- or AC-Safety note to a function, indicates
the safety status of the function is known to differ from the specified
status in the POSIX standard. For example, POSIX does not require a
function to be Safe, but our implementation is, or vice-versa.

=46or the time being, the absence of this remark does not imply the saf=
ety
properties we documented are identical to those mandated by POSIX for
the corresponding functions.

@item @code{:identifier}
@cindex :identifier

Annotations may sometimes be followed by identifiers, intended to group
several functions that e.g. access the data structures in an unsafe way=
,
as in @code{race} and @code{const}, or to provide more specific
information, such as naming a signal in a function marked with
@code{sig}. It is envisioned that it may be applied to @code{lock} and
@code{corrupt} as well in the future.

In most cases, the identifier will name a set of functions, but it may
name global objects or function arguments, or identifiable properties o=
r
logical components associated with them, with a notation such as
e.g. @code{:buf(arg)} to denote a buffer associated with the argument
@var{arg}, or @code{:tcattr(fd)} to denote the terminal attributes of a
file descriptor @var{fd}.

The most common use for identifiers is to provide logical groups of
functions and arguments that need to be protected by the same
synchronization primitive in order to ensure safe operation in a given
context.

@item @code{/condition}
@cindex /condition

Some safety annotations may be conditional, in that they only apply if =
a
boolean expression involving arguments, global variables or even the
underlying kernel evaluates evaluates to true. Such conditions as
@code{/hurd} or @code{/!linux!bsd} indicate the preceding marker only
applies when the underlying kernel is the HURD, or when it is neither
Linux nor a BSD kernel, respectively. @code{/!ps} and
@code{/one_per_line} indicate the preceding marker only applies when
argument @var{ps} is NULL, or global variable @var{one_per_line} is
nonzero.

When all marks that render a function unsafe are adorned with such
conditions, and none of the named conditions hold, then the function ca=
n
be regarded as safe.

@end itemize
---
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Kerrisk (man-pages)
2014-07-15 05:21:35 UTC
Permalink
Hi Carlos,

My apologies for not replying sooner. Very limited time these days.
Post by Carlos O'Donell
Michael,
=20
I submit the following text to the Linux Kernel Man Pages project.
The goal being that we copy-edit this into a safety attributes
man page and thus harmonize the definition of thread safe,
async-cancel safe, and async-signal safe between glibc and the
linux kernel man page project.
=20
Please feel free to use all, some, or non of this document. It is
included under GPLv2+_DOC_FULL for your use in the linux kernel man
pages project. It is presently formatted as info, please feel free
to reformat. For example the HURD parts of the doucment do not apply
since the man pages are intended for systems using the Linux
kernel e.g. GNU/Linux.
Thanks very much for this. When I some available time, I'll be=20
working this up into an attributes(7) page. (Probably will be a=20
few weeks away, unfortuantely.)
Post by Carlos O'Donell
As always I look forward to continued harmonization between the
glibc manual and linux kernel man pages project :-)
Likewise. It's really a lot more pleasant working with the glibc
project these days!

Cheers,

Michael
Post by Carlos O'Donell
---
.\" Copyright (c) 2014, Red Hat, Inc.
.\"
.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
.\" This is free documentation; you can redistribute it and/or
.\" modify it under the terms of the GNU General Public License as
.\" published by the Free Software Foundation; either version 2 of
.\" the License, or (at your option) any later version.
.\"
.\" The GNU General Public License's references to "object code"
.\" and "executables" are to be interpreted as the output of any
.\" document formatting or typesetting system, including
.\" intermediate and printed output.
.\"
.\" This manual is distributed in the hope that it will be useful,
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.\" GNU General Public License for more details.
.\"
.\" You should have received a copy of the GNU General Public
.\" License along with this manual; if not, see
.\" <http://www.gnu.org/licenses/>.
.\" %%%LICENSE_END
=20
@node POSIX Safety Concepts, Unsafe Features, , POSIX
@subsubsection POSIX Safety Concepts
@cindex POSIX Safety Concepts
=20
=20
@sampsafety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
=20
The properties are assessed according to the criteria set forth in th=
e
Post by Carlos O'Donell
POSIX standard for such safety contexts as Thread-, Async-Signal- and
Async-Cancel- -Safety. Intuitive definitions of these properties,
attempting to capture the meaning of the standard definitions, follow=
=2E
Post by Carlos O'Donell
=20
@itemize @bullet
=20
@item
@cindex MT-Safe
@cindex Thread-Safe
@code{MT-Safe} or Thread-Safe functions are safe to call in the prese=
nce
Post by Carlos O'Donell
of other threads. MT, in MT-Safe, stands for Multi Thread.
=20
Being MT-Safe does not imply a function is atomic, nor that it uses a=
ny
Post by Carlos O'Donell
of the memory synchronization mechanisms POSIX exposes to users. It =
is
Post by Carlos O'Donell
even possible that calling MT-Safe functions in sequence does not yie=
ld
Post by Carlos O'Donell
an MT-Safe combination. For example, having a thread call two MT-Saf=
e
Post by Carlos O'Donell
functions one right after the other does not guarantee behavior
equivalent to atomic execution of a combination of both functions, si=
nce
Post by Carlos O'Donell
concurrent calls in other threads may interfere in a destructive way.
=20
Whole-program optimizations that could inline functions across librar=
y
Post by Carlos O'Donell
interfaces may expose unsafe reordering, and so performing inlining
MT-Safety status is not guaranteed under whole-program optimization.
However, functions defined in user-visible headers are designed to be
safe for inlining.
=20
@item
@cindex AS-Safe
@cindex Async-Signal-Safe
@code{AS-Safe} or Async-Signal-Safe functions are safe to call from
asynchronous signal handlers. AS, in AS-Safe, stands for Asynchronou=
s
Post by Carlos O'Donell
Signal.
=20
floating-point environment, because their doing so does not make them
unsuitable for use in signal handlers. However, programs could
misbehave should asynchronous signal handlers modify this thread-loca=
l
Post by Carlos O'Donell
state, and the signal handling machinery cannot be counted on to
preserve it. Therefore, signal handlers that call functions that may
save their original values, and restore them before returning.
=20
@item
@cindex AC-Safe
@cindex Async-Cancel-Safe
@code{AC-Safe} or Async-Cancel-Safe functions are safe to call when
asynchronous cancellation is enabled. AC in AC-Safe stands for
Asynchronous Cancellation.
=20
The POSIX standard defines only three functions to be AC-Safe, namely
@code{pthread_cancel}, @code{pthread_setcancelstate}, and
@code{pthread_setcanceltype}. At present @theglibc{} provides no
guarantees beyond these three functions, but does document which
functions are presently AC-Safe. This documentation is provided for =
use
Post by Carlos O'Donell
=20
Just like signal handlers, cancellation cleanup routines must configu=
re
Post by Carlos O'Donell
the floating point environment they require. The routines cannot ass=
ume
Post by Carlos O'Donell
a floating point environment, particularly when asynchronous
cancellation is enabled. If the configuration of the floating point
environment cannot be performed atomically then it is also possible t=
hat
Post by Carlos O'Donell
the environment encountered is internally inconsistent.
=20
@item
@cindex MT-Unsafe
@cindex Thread-Unsafe
@cindex AS-Unsafe
@cindex Async-Signal-Unsafe
@cindex AC-Unsafe
@cindex Async-Cancel-Unsafe
@code{MT-Unsafe}, @code{AS-Unsafe}, @code{AC-Unsafe} functions are no=
t
Post by Carlos O'Donell
safe to call within the safety contexts described above. Calling the=
m
Post by Carlos O'Donell
within such contexts invokes undefined behavior.
=20
Functions not explicitly documented as safe in a safety context shoul=
d
Post by Carlos O'Donell
be regarded as Unsafe.
=20
@item
@cindex Preliminary
@code{Preliminary} safety properties are documented, indicating these
@theglibc{}.
=20
Such preliminary properties are the result of an assessment of the
properties of our current implementation, rather than of what is
mandated and permitted by current and future standards.
=20
Although we strive to abide by the standards, in some cases our
implementation is safe even when the standard does not demand safety,
and in other cases our implementation does not meet the standard safe=
ty
Post by Carlos O'Donell
requirements. The latter are most likely bugs; the former, when mark=
ed
Post by Carlos O'Donell
require changes that are not compatible with the additional safety
properties afforded by the current implementation.
=20
Furthermore, the POSIX standard does not offer a detailed definition =
of
Post by Carlos O'Donell
safety. We assume that, by ``safe to call'', POSIX means that, as lo=
ng
Post by Carlos O'Donell
as the program does not invoke undefined behavior, the ``safe to call=
''
Post by Carlos O'Donell
function behaves as specified, and does not cause other functions to
deviate from their specified behavior. We have chosen to use its loo=
se
Post by Carlos O'Donell
definitions of safety, not because they are the best definitions to u=
se,
Post by Carlos O'Donell
but because choosing them harmonizes this manual with POSIX.
=20
Please keep in mind that these are preliminary definitions and
annotations, and certain aspects of the definitions are still under
discussion and might be subject to clarification or change.
=20
Over time, we envision evolving the preliminary safety notes into sta=
ble
Post by Carlos O'Donell
commitments, as stable as those of our interfaces. As we do, we will
the
Post by Carlos O'Donell
keyword remains, however, they are not to be regarded as a promise of
future behavior.
=20
@end itemize
=20
Other keywords that appear in safety notes are defined in subsequent
sections.
=20
@node Unsafe Features, Conditionally Safe Features, POSIX Safety Conc=
epts, POSIX
Post by Carlos O'Donell
@subsubsection Unsafe Features
@cindex Unsafe Features
=20
Functions that are unsafe to call in certain contexts are annotated w=
ith
Post by Carlos O'Donell
keywords that document their features that make them unsafe to call.
AS-Unsafe features in this section indicate the functions are never s=
afe
Post by Carlos O'Donell
to call when asynchronous signals are enabled. AC-Unsafe features
indicate they are never safe to call when asynchronous cancellation i=
s
Post by Carlos O'Donell
enabled. There are no MT-Unsafe marks in this section.
=20
@itemize @bullet
=20
@item @code{lock}
@cindex lock
=20
interrupted by a signal while holding a non-recursive lock. If the
signal handler calls another such function that takes the same lock, =
the
Post by Carlos O'Donell
result is a deadlock.
=20
cancelled asynchronously, fail to release a lock that would have been
released if their execution had not been interrupted by asynchronous
thread cancellation. Once a lock is left taken, attempts to take tha=
t
Post by Carlos O'Donell
lock will block indefinitely.
=20
@item @code{corrupt}
@cindex corrupt
=20
upt
Post by Carlos O'Donell
data structures and misbehave when they interrupt, or are interrupted
these take recursive locks to avoid MT-Safety problems, but this is n=
ot
Post by Carlos O'Donell
enough to stop a signal handler from observing a partially-updated da=
ta
Post by Carlos O'Donell
structure. Further corruption may arise from the interrupted functio=
n's
Post by Carlos O'Donell
failure to notice updates made by signal handlers.
=20
e
Post by Carlos O'Donell
data structures in a corrupt, partially updated state. Subsequent us=
es
Post by Carlos O'Donell
of the data structure may misbehave.
=20
@c A special case, probably not worth documenting separately, involve=
s
Post by Carlos O'Donell
@c reallocing, or even freeing pointers. Any case involving free cou=
ld
Post by Carlos O'Donell
@c be easily turned into an ac-safe leak by resetting the pointer bef=
ore
Post by Carlos O'Donell
@c releasing it; I don't think we have any case that calls for this s=
ort
Post by Carlos O'Donell
@c of fixing. Fixing the realloc cases would require a new interface=
@c instead of @code{ptr=3Drealloc(ptr,size)} we'd have to introduce
@c @code{acsafe_realloc(&ptr,size)} that would modify ptr before
@c releasing the old memory. The ac-unsafe realloc could be implemen=
ted
Post by Carlos O'Donell
@c in terms of an internal interface with this semantics (say
@c __acsafe_realloc), but since realloc can be overridden, the functi=
on
Post by Carlos O'Donell
@c we call to implement realloc should not be this internal interface=
,
Post by Carlos O'Donell
@c but another internal interface that calls __acsafe_realloc if real=
loc
Post by Carlos O'Donell
@c was not overridden, and calls the overridden realloc with async
@c cancel disabled. --lxoliva
=20
@item @code{heap}
@cindex heap
=20
are
Post by Carlos O'Donell
=20
@sampsafety{@asunsafe{@asulock{}}@acunsafe{@aculock{} @acsfd{} @acsme=
m{}}}
Post by Carlos O'Donell
=20
@c Check for cases that should have used plugin instead of or in
@c addition to this. Then, after rechecking gettext, adjust i18n if
@c needed.
@item @code{dlopen}
@cindex dlopen
=20
shared libraries into the current execution image. This involves
opening files, mapping them into memory, allocating additional memory=
,
Post by Carlos O'Donell
resolving symbols, applying relocations and more, all of this while
holding internal dynamic loader locks.
=20
The locks are enough for these functions to be AS- and AC-Unsafe, but
other issues may arise. At present this is a placeholder for all
=20
@c dlopen runs init and fini sections of the module; does this mean
@c dlopen always implies plugin?
=20
@item @code{plugin}
@cindex plugin
=20
be
Post by Carlos O'Donell
MT-Safe, AS-Unsafe and AC-Unsafe. Examples of such plugins are stack
@cindex NSS
unwinding libraries, name service switch (NSS) and character set
@cindex iconv
conversion (iconv) back-ends.
=20
Although the plugins mentioned as examples are all brought in by mean=
s
se
Post by Carlos O'Donell
module and finds the addresses of some of its functions, while anothe=
r
Post by Carlos O'Donell
just calls those already-resolved functions, the former will be marke=
d
hen
Post by Carlos O'Donell
a single function takes all of these actions, then it gets both marks=
=2E
Post by Carlos O'Donell
=20
@item @code{i18n}
@cindex i18n
=20
ose
Post by Carlos O'Donell
=20
@sampsafety{@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @as=
=20
@item @code{timer}
@cindex timer
=20
similar to set a time-out for a system call or a long-running operati=
on.
Post by Carlos O'Donell
In a multi-threaded program, there is a risk that the time-out signal
will be delivered to a different thread, thus failing to interrupt th=
e
Post by Carlos O'Donell
intended thread. Besides being MT-Unsafe, such functions are always
AS-Unsafe, because calling them in signal handlers may interfere with
timers set in the interrupted code, and AC-Unsafe, because there is n=
o
Post by Carlos O'Donell
safe way to guarantee an earlier timer will be reset in case of
asynchronous cancellation.
=20
@end itemize
=20
@node Conditionally Safe Features, Other Safety Remarks, Unsafe Featu=
res, POSIX
Post by Carlos O'Donell
@subsubsection Conditionally Safe Features
@cindex Conditionally Safe Features
=20
For some features that make functions unsafe to call in certain
contexts, there are known ways to avoid the safety problem other than
refraining from calling the function altogether. The keywords that
follow refer to such features, and each of their definitions indicate
how the whole program needs to be constrained in order to remove the
safety problem indicated by the keyword. Only when all the reasons t=
hat
Post by Carlos O'Donell
make a function unsafe are observed and addressed, by applying the
documented constraints, does the function become safe to call in a
context.
=20
@itemize @bullet
=20
@item @code{init}
@cindex init
=20
MT-Unsafe initialization when they are first called.
=20
Calling such a function at least once in single-threaded mode removes
this specific cause for the function to be regarded as MT-Unsafe. If=
no
Post by Carlos O'Donell
other cause for that remains, the function can then be safely called
after other threads are started.
=20
the
Post by Carlos O'Donell
data structures.
=20
If a signal handler interrupts such an initializer, and calls any
deadlock if the thread library has been loaded.
=20
Furthermore, if an initializer is partially complete before it is
canceled or interrupted by a signal whose handler requires the same
initialization, some or all of the initialization may be performed mo=
re
Post by Carlos O'Donell
than once, leaking resources or even resulting in corrupt internal da=
ta.
Post by Carlos O'Donell
=20
n
Post by Carlos O'Donell
AS- or AC-Unsafe feature should ensure the initialization is performe=
d
Post by Carlos O'Donell
before configuring signal handlers or enabling cancellation, so that =
the
Post by Carlos O'Donell
=20
@c We may have to extend the annotations to cover conditions in which
@c initialization may or may not occur, since an initial call in a sa=
fe
Post by Carlos O'Donell
@c context is no use if the initialization doesn't take place at that
@c time: it doesn't remove the risk for later calls.
=20
@item @code{race}
@cindex race
=20
objects in ways that may cause data races or similar forms of
destructive interference out of concurrent execution. In some cases,
the objects are passed to the functions by users; in others, they are
used by the functions to return values to users; in others, they are =
not
Post by Carlos O'Donell
even exposed to users.
=20
We consider access to objects passed as (indirect) arguments to
functions to be data race free. The assurance of data race free obje=
cts
Post by Carlos O'Donell
is the caller's responsibility. We will not mark a function as
MT-Unsafe or AS-Unsafe if it misbehaves when users fail to take the
measures required by POSIX to avoid data races when dealing with such
objects. As a general rule, if a function is documented as reading f=
rom
Post by Carlos O'Donell
an object passed (by reference) to it, or modifying it, users ought t=
o
Post by Carlos O'Donell
use memory synchronization primitives to avoid data races just as the=
y
Post by Carlos O'Donell
would should they perform the accesses themselves rather than by call=
ing
Post by Carlos O'Donell
general rule, in that POSIX mandates the library to guard against dat=
a
Post by Carlos O'Donell
races in many functions that manipulate objects of this specific opaq=
ue
Post by Carlos O'Donell
type. We regard this as a convenience provided to users, rather than=
as
Post by Carlos O'Donell
a general requirement whose expectations should extend to other types=
=2E
Post by Carlos O'Donell
=20
In order to remind users that guarding certain arguments is their
responsibility, we will annotate functions that take objects of certa=
in
Post by Carlos O'Donell
types as arguments. We draw the line for objects passed by users as
follows: objects whose types are exposed to users, and that users are
expected to access directly, such as memory buffers, strings, and
or
Post by Carlos O'Donell
redundant with the general requirement, and not many would be surpris=
ed
Post by Carlos O'Donell
by the library's lack of internal guards when accessing objects that =
can
Post by Carlos O'Donell
be accessed directly by users.
=20
As for objects that are opaque or opaque-like, in that they are to be
manipulated only by passing them to library functions (e.g.,
@code{FILE}, @code{DIR}, @code{obstack}, @code{iconv_t}), there might=
be
Post by Carlos O'Donell
additional expectations as to internal coordination of access by the
the
Post by Carlos O'Donell
argument name, functions that take such objects but that do not take
care of synchronizing access to them by default. For example,
@code{FILE} stream @code{unlocked} functions will be annotated, but
will not, even though the implicit locking may be disabled on a
per-stream basis.
=20
In either case, we will not regard as MT-Unsafe functions that may
access user-supplied objects in unsafe ways should users fail to ensu=
re
Post by Carlos O'Donell
the accesses are well defined. The notion prevails that users are
expected to safeguard against data races any user-supplied objects th=
at
Post by Carlos O'Donell
the library accesses on their behalf.
=20
@c The above describes @mtsrace; @mtasurace is described below.
=20
This user responsibility does not apply, however, to objects controll=
ed
Post by Carlos O'Donell
by the library itself, such as internal objects and static buffers us=
ed
Post by Carlos O'Donell
to return values from certain calls. When the library doesn't guard
them against concurrent uses, these cases are regarded as MT-Unsafe a=
nd
ted
Post by Carlos O'Donell
as redundant with the one under MT-Unsafe). As in the case of
user-exposed objects, the mark may be followed by a colon and an
identifier. The identifier groups all functions that operate on a
certain unguarded object; users may avoid the MT-Safety issues relate=
d
Post by Carlos O'Donell
with unguarded concurrent access to such internal objects by creating=
a
Post by Carlos O'Donell
non-recursive mutex related with the identifier, and always holding t=
he
Post by Carlos O'Donell
mutex when calling any function marked as racy on that identifier, as
they would have to should the identifier be an object under user
control. The non-recursive mutex avoids the MT-Safety issue, but it
trades one AS-Safety issue for another, so use in asynchronous signal=
s
Post by Carlos O'Donell
remains undefined.
=20
When the identifier relates to a static buffer used to hold return
values, the mutex must be held for as long as the buffer remains in u=
se
Post by Carlos O'Donell
by the caller. Many functions that return pointers to static buffers
offer reentrant variants that store return values in caller-supplied
is
Post by Carlos O'Donell
chosen not by calling an alternate entry point, but by passing a
e
Post by Carlos O'Donell
to be stored. These variants are generally preferable in multi-threa=
ded
Post by Carlos O'Donell
programs, although some of them are not MT-Safe because of other
=20
@item @code{const}
@cindex const
=20
ly
Post by Carlos O'Donell
modify internal objects that are better regarded as constant, because=
a
Post by Carlos O'Donell
writers of internal objects to be regarded as MT-Unsafe and AS-Unsafe=
,
Post by Carlos O'Donell
this mark is applied to writers only. Writers remain equally MT- and
AS-Unsafe to call, but the then-mandatory constness of objects they
modify enables readers to be regarded as MT-Safe and AS-Safe (as long=
as
Post by Carlos O'Donell
no other reasons for them to be unsafe remain), since the lack of
synchronization is not a problem when the objects are effectively
constant.
=20
lf
Post by Carlos O'Donell
as a safety note in readers. Programs that wish to work around this
safety issue, so as to call writers, may use a non-recursve
@code{rwlock} associated with the identifier, and guard @emph{all} ca=
lls
a
ier
Post by Carlos O'Donell
by itself with a read lock. The non-recursive locking removes the
MT-Safety problem, but it trades one AS-Safety problem for another, s=
o
Post by Carlos O'Donell
use in asynchronous signals remains undefined.
=20
@c But what if, instead of marking modifiers with const:id and reader=
s
Post by Carlos O'Donell
@c with just id, we marked writers with race:id and readers with ro:i=
d?
Post by Carlos O'Donell
@c Instead of having to define each instance of =93id=94, we'd have a
@c general pattern governing all such =93id=94s, wherein race:id woul=
d
Post by Carlos O'Donell
@c suggest the need for an exclusive/write lock to make the function
@c safe, whereas ro:id would indicate =93id=94 is expected to be read=
-only,
Post by Carlos O'Donell
@c but if any modifiers are called (while holding an exclusive lock),
@c then ro:id-marked functions ought to be guarded with a read lock f=
or
Post by Carlos O'Donell
@c safe operation. ro:env or ro:locale, for example, seems to convey
@c more clearly the expectations and the meaning, than just env or
@c locale.
=20
@item @code{sig}
@cindex sig
=20
n
Post by Carlos O'Donell
identical AS-Safety issue, omitted for brevity) may temporarily insta=
ll
Post by Carlos O'Donell
a signal handler for internal purposes, which may interfere with othe=
r
Post by Carlos O'Donell
uses of the signal, identified after a colon.
=20
This safety problem can be worked around by ensuring that no other us=
es
Post by Carlos O'Donell
of the signal will take place for the duration of the call. Holding =
a
Post by Carlos O'Donell
non-recursive mutex while calling all functions that use the same
temporary signal; blocking that signal before the call and resetting =
its
Post by Carlos O'Donell
handler afterwards is recommended.
=20
There is no safe way to guarantee the original signal handler is
restored in case of asynchronous cancellation, therefore so-marked
functions are also AC-Unsafe.
=20
@c fixme: at least deferred cancellation should get it right, and wou=
ld
Post by Carlos O'Donell
@c obviate the restoring bit below, and the qualifier above.
=20
Besides the measures recommended to work around the MT- and AS-Safety
problem, in order to avert the cancellation problem, disabling
restore the signal to the desired state and to release the mutex are
recommended.
=20
@item @code{term}
@cindex term
=20
e
r},
dow
Post by Carlos O'Donell
in which changes made by other threads are lost. Thus, functions mar=
ked
by
Post by Carlos O'Donell
asynchronous signals to be lost. These functions are also AS-Unsafe,
but the corresponding mark is omitted as redundant.
=20
It is thus advisable for applications using the terminal to avoid
concurrent and reentrant interactions with it, by not using it in sig=
nal
Post by Carlos O'Donell
handlers or blocking signals that might use it, and holding a lock wh=
ile
Post by Carlos O'Donell
calling these functions and interacting with the terminal. This lock
should also be used for mutual exclusion with functions marked with
@code{@mtasurace{:tcattr(fd)}}, where @var{fd} is a file descriptor f=
or
Post by Carlos O'Donell
the controlling terminal. The caller may use a single mutex for
simplicity, or use one mutex per terminal, even if referenced by
different file descriptors.
=20
to
Post by Carlos O'Donell
restore terminal settings to their original state, after temporarily
changing them, but they may fail to do so if cancelled.
=20
@c fixme: at least deferred cancellation should get it right, and wou=
ld
Post by Carlos O'Donell
@c obviate the restoring bit below, and the qualifier above.
=20
Besides the measures recommended to work around the MT- and AS-Safety
problem, in order to avert the cancellation problem, disabling
restore the terminal settings to the original state and to release th=
e
Post by Carlos O'Donell
mutex are recommended.
=20
@end itemize
=20
@node Other Safety Remarks, , Conditionally Safe Features, POSIX
@subsubsection Other Safety Remarks
@cindex Other Safety Remarks
=20
Additional keywords may be attached to functions, indicating features
that do not make a function unsafe to call, but that may need to be
=20
@itemize @bullet
=20
@item @code{locale}
@cindex locale
=20
m
Post by Carlos O'Donell
the locale object without any form of synchronization. Functions
may
Post by Carlos O'Donell
behave in ways that do not correspond to any of the locales active
during their execution, but an unpredictable mix thereof.
=20
We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the locale object are marked with
@code{const:locale} and regarded as unsafe. Being unsafe, the latter
are not to be called when multiple threads are running or asynchronou=
s
Post by Carlos O'Donell
signals are enabled, and so the locale can be considered effectively
constant in these contexts, which makes the former safe.
=20
@c Should the locking strategy suggested under @code{const} be used,
@c failure to guard locale uses is not as fatal as data races in
@c general: unguarded uses will @emph{not} follow dangling pointers o=
r
Post by Carlos O'Donell
@c access uninitialized, unmapped or recycled memory. Each access wi=
ll
Post by Carlos O'Donell
@c read from a consistent locale object that is or was active at some
@c point during its execution. Without synchronization, however, it
@c cannot even be assumed that, after a change in locale, earlier
@c locales will no longer be used, even after the newly-chosen one is
@c used in the thread. Nevertheless, even though unguarded reads fro=
m
Post by Carlos O'Donell
@c the locale will not violate type safety, functions that access the
@c locale multiple times may invoke all sorts of undefined behavior
@c because of the unexpected locale changes.
=20
@item @code{env}
@cindex env
=20
re
Post by Carlos O'Donell
safety in the presence of concurrent modifications.
=20
We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the environment are all marked with
@code{const:env} and regarded as unsafe. Being unsafe, the latter ar=
e
Post by Carlos O'Donell
not to be called when multiple threads are running or asynchronous
signals are enabled, and so the environment can be considered
effectively constant in these contexts, which makes the former safe.
=20
@item @code{hostid}
@cindex hostid
=20
om
Post by Carlos O'Donell
the system-wide data structures that hold the ``host ID'' of the
machine. These data structures cannot generally be modified atomical=
ly.
Post by Carlos O'Donell
Since it is expected that the ``host ID'' will not normally change, t=
he
th
Post by Carlos O'Donell
@code{@mtasuconst{:@mtshostid{}}}, indicating it may require special
care if it is to be called. In this specific case, the special care
amounts to system-wide (not merely intra-process) coordination.
=20
@item @code{sigintr}
@cindex sigintr
=20
@code{_sigintr} internal data structure without any guards to ensure
safety in the presence of concurrent modifications.
=20
We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the this data structure are all marked with
@code{const:sigintr} and regarded as unsafe. Being unsafe, the latte=
r
Post by Carlos O'Donell
are not to be called when multiple threads are running or asynchronou=
s
Post by Carlos O'Donell
signals are enabled, and so the data structure can be considered
effectively constant in these contexts, which makes the former safe.
=20
@item @code{fd}
@cindex fd
=20
e
Post by Carlos O'Donell
descriptors if asynchronous thread cancellation interrupts their
execution.
=20
Functions that allocate or deallocate file descriptors will generally=
be
Post by Carlos O'Donell
marked as such. Even if they attempted to protect the file descripto=
r
Post by Carlos O'Donell
allocation and deallocation with cleanup regions, allocating a new
descriptor and storing its number where the cleanup region could rele=
ase
Post by Carlos O'Donell
it cannot be performed as a single atomic operation. Similarly,
releasing the descriptor and taking it out of the data structure
normally responsible for releasing it cannot be performed atomically.
There will always be a window in which the descriptor cannot be relea=
sed
Post by Carlos O'Donell
because it was not stored in the cleanup handler argument yet, or it =
was
Post by Carlos O'Donell
already taken out before releasing it. It cannot be taken out after
release: an open descriptor could mean either that the descriptor sti=
ll
Post by Carlos O'Donell
has to be closed, or that it already did so but the descriptor was
reallocated by another thread or signal handler.
=20
Such leaks could be internally avoided, with some performance penalty=
,
Post by Carlos O'Donell
by temporarily disabling asynchronous thread cancellation. However,
since callers of allocation or deallocation functions would have to d=
o
Post by Carlos O'Donell
this themselves, to avoid the same sort of leak in their own layer, i=
t
Post by Carlos O'Donell
makes more sense for the library to assume they are taking care of it
than to impose a performance penalty that is redundant when the probl=
em
Post by Carlos O'Donell
is solved in upper layers, and insufficient when it is not.
=20
This remark by itself does not cause a function to be regarded as
AC-Unsafe. However, cumulative effects of such leaks may pose a
problem for some programs. If this is the case, suspending asynchron=
ous
Post by Carlos O'Donell
cancellation for the duration of calls to such functions is recommend=
ed.
Post by Carlos O'Donell
=20
@item @code{mem}
@cindex mem
=20
memory if asynchronous thread cancellation interrupts their execution=
=2E
Post by Carlos O'Donell
=20
The problem is similar to that of file descriptors: there is no atomi=
c
Post by Carlos O'Donell
interface to allocate memory and store its address in the argument to=
a
Post by Carlos O'Donell
cleanup handler, or to release it and remove its address from that
argument, without at least temporarily disabling asynchronous
cancellation, which these functions do not do.
=20
This remark does not by itself cause a function to be regarded as
generally AC-Unsafe. However, cumulative effects of such leaks may b=
e
Post by Carlos O'Donell
severe enough for some programs that disabling asynchronous cancellat=
ion
Post by Carlos O'Donell
for the duration of calls to such functions may be required.
=20
@item @code{cwd}
@cindex cwd
=20
y
Post by Carlos O'Donell
change the current working directory during their execution, which ma=
y
Post by Carlos O'Donell
cause relative pathnames to be resolved in unexpected ways in other
threads or within asynchronous signal or cancellation handlers.
=20
This is not enough of a reason to mark so-marked functions as MT- or
@code{FTW_CHDIR}), avoiding the option may be a good alternative to
system calls.
=20
@item @code{!posix}
@cindex !posix
=20
This remark, as an MT-, AS- or AC-Safety note to a function, indicate=
s
Post by Carlos O'Donell
the safety status of the function is known to differ from the specifi=
ed
Post by Carlos O'Donell
status in the POSIX standard. For example, POSIX does not require a
function to be Safe, but our implementation is, or vice-versa.
=20
For the time being, the absence of this remark does not imply the saf=
ety
Post by Carlos O'Donell
properties we documented are identical to those mandated by POSIX for
the corresponding functions.
=20
@item @code{:identifier}
@cindex :identifier
=20
Annotations may sometimes be followed by identifiers, intended to gro=
up
Post by Carlos O'Donell
several functions that e.g. access the data structures in an unsafe w=
ay,
Post by Carlos O'Donell
information, such as naming a signal in a function marked with
@code{sig}. It is envisioned that it may be applied to @code{lock} a=
nd
Post by Carlos O'Donell
@code{corrupt} as well in the future.
=20
In most cases, the identifier will name a set of functions, but it ma=
y
Post by Carlos O'Donell
name global objects or function arguments, or identifiable properties=
or
Post by Carlos O'Donell
logical components associated with them, with a notation such as
@var{arg}, or @code{:tcattr(fd)} to denote the terminal attributes of=
a
Post by Carlos O'Donell
=20
The most common use for identifiers is to provide logical groups of
functions and arguments that need to be protected by the same
synchronization primitive in order to ensure safe operation in a give=
n
Post by Carlos O'Donell
context.
=20
@item @code{/condition}
@cindex /condition
=20
Some safety annotations may be conditional, in that they only apply i=
f a
Post by Carlos O'Donell
boolean expression involving arguments, global variables or even the
underlying kernel evaluates evaluates to true. Such conditions as
@code{/hurd} or @code{/!linux!bsd} indicate the preceding marker only
applies when the underlying kernel is the HURD, or when it is neither
@code{/one_per_line} indicate the preceding marker only applies when
nonzero.
=20
When all marks that render a function unsafe are adorned with such
conditions, and none of the named conditions hold, then the function =
can
Post by Carlos O'Donell
be regarded as safe.
=20
@end itemize
---
=20
--=20
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Kerrisk (man-pages)
2014-10-17 05:00:15 UTC
Permalink
Post by Carlos O'Donell
Michael,
=20
I submit the following text to the Linux Kernel Man Pages project.
The goal being that we copy-edit this into a safety attributes
man page and thus harmonize the definition of thread safe,
async-cancel safe, and async-signal safe between glibc and the
linux kernel man page project.
=20
Please feel free to use all, some, or non of this document. It is
included under GPLv2+_DOC_FULL for your use in the linux kernel man
pages project. It is presently formatted as info, please feel free
to reformat. For example the HURD parts of the doucment do not apply
since the man pages are intended for systems using the Linux
kernel e.g. GNU/Linux.
=20
As always I look forward to continued harmonization between the
glibc manual and linux kernel man pages project :-)
Hi Carlos,

I started editing up some of this material for inclusion in
a man page. As well as the Copyright, I do find it useful to
list the authors for the text. Is that you, Alexandre, or both?

Thanks,

Michael
Post by Carlos O'Donell
---
.\" Copyright (c) 2014, Red Hat, Inc.
.\"
.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
.\" This is free documentation; you can redistribute it and/or
.\" modify it under the terms of the GNU General Public License as
.\" published by the Free Software Foundation; either version 2 of
.\" the License, or (at your option) any later version.
.\"
.\" The GNU General Public License's references to "object code"
.\" and "executables" are to be interpreted as the output of any
.\" document formatting or typesetting system, including
.\" intermediate and printed output.
.\"
.\" This manual is distributed in the hope that it will be useful,
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.\" GNU General Public License for more details.
.\"
.\" You should have received a copy of the GNU General Public
.\" License along with this manual; if not, see
.\" <http://www.gnu.org/licenses/>.
.\" %%%LICENSE_END
=20
@node POSIX Safety Concepts, Unsafe Features, , POSIX
@subsubsection POSIX Safety Concepts
@cindex POSIX Safety Concepts
=20
=20
@sampsafety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
=20
The properties are assessed according to the criteria set forth in th=
e
Post by Carlos O'Donell
POSIX standard for such safety contexts as Thread-, Async-Signal- and
Async-Cancel- -Safety. Intuitive definitions of these properties,
attempting to capture the meaning of the standard definitions, follow=
=2E
Post by Carlos O'Donell
=20
@itemize @bullet
=20
@item
@cindex MT-Safe
@cindex Thread-Safe
@code{MT-Safe} or Thread-Safe functions are safe to call in the prese=
nce
Post by Carlos O'Donell
of other threads. MT, in MT-Safe, stands for Multi Thread.
=20
Being MT-Safe does not imply a function is atomic, nor that it uses a=
ny
Post by Carlos O'Donell
of the memory synchronization mechanisms POSIX exposes to users. It =
is
Post by Carlos O'Donell
even possible that calling MT-Safe functions in sequence does not yie=
ld
Post by Carlos O'Donell
an MT-Safe combination. For example, having a thread call two MT-Saf=
e
Post by Carlos O'Donell
functions one right after the other does not guarantee behavior
equivalent to atomic execution of a combination of both functions, si=
nce
Post by Carlos O'Donell
concurrent calls in other threads may interfere in a destructive way.
=20
Whole-program optimizations that could inline functions across librar=
y
Post by Carlos O'Donell
interfaces may expose unsafe reordering, and so performing inlining
MT-Safety status is not guaranteed under whole-program optimization.
However, functions defined in user-visible headers are designed to be
safe for inlining.
=20
@item
@cindex AS-Safe
@cindex Async-Signal-Safe
@code{AS-Safe} or Async-Signal-Safe functions are safe to call from
asynchronous signal handlers. AS, in AS-Safe, stands for Asynchronou=
s
Post by Carlos O'Donell
Signal.
=20
floating-point environment, because their doing so does not make them
unsuitable for use in signal handlers. However, programs could
misbehave should asynchronous signal handlers modify this thread-loca=
l
Post by Carlos O'Donell
state, and the signal handling machinery cannot be counted on to
preserve it. Therefore, signal handlers that call functions that may
save their original values, and restore them before returning.
=20
@item
@cindex AC-Safe
@cindex Async-Cancel-Safe
@code{AC-Safe} or Async-Cancel-Safe functions are safe to call when
asynchronous cancellation is enabled. AC in AC-Safe stands for
Asynchronous Cancellation.
=20
The POSIX standard defines only three functions to be AC-Safe, namely
@code{pthread_cancel}, @code{pthread_setcancelstate}, and
@code{pthread_setcanceltype}. At present @theglibc{} provides no
guarantees beyond these three functions, but does document which
functions are presently AC-Safe. This documentation is provided for =
use
Post by Carlos O'Donell
=20
Just like signal handlers, cancellation cleanup routines must configu=
re
Post by Carlos O'Donell
the floating point environment they require. The routines cannot ass=
ume
Post by Carlos O'Donell
a floating point environment, particularly when asynchronous
cancellation is enabled. If the configuration of the floating point
environment cannot be performed atomically then it is also possible t=
hat
Post by Carlos O'Donell
the environment encountered is internally inconsistent.
=20
@item
@cindex MT-Unsafe
@cindex Thread-Unsafe
@cindex AS-Unsafe
@cindex Async-Signal-Unsafe
@cindex AC-Unsafe
@cindex Async-Cancel-Unsafe
@code{MT-Unsafe}, @code{AS-Unsafe}, @code{AC-Unsafe} functions are no=
t
Post by Carlos O'Donell
safe to call within the safety contexts described above. Calling the=
m
Post by Carlos O'Donell
within such contexts invokes undefined behavior.
=20
Functions not explicitly documented as safe in a safety context shoul=
d
Post by Carlos O'Donell
be regarded as Unsafe.
=20
@item
@cindex Preliminary
@code{Preliminary} safety properties are documented, indicating these
@theglibc{}.
=20
Such preliminary properties are the result of an assessment of the
properties of our current implementation, rather than of what is
mandated and permitted by current and future standards.
=20
Although we strive to abide by the standards, in some cases our
implementation is safe even when the standard does not demand safety,
and in other cases our implementation does not meet the standard safe=
ty
Post by Carlos O'Donell
requirements. The latter are most likely bugs; the former, when mark=
ed
Post by Carlos O'Donell
require changes that are not compatible with the additional safety
properties afforded by the current implementation.
=20
Furthermore, the POSIX standard does not offer a detailed definition =
of
Post by Carlos O'Donell
safety. We assume that, by ``safe to call'', POSIX means that, as lo=
ng
Post by Carlos O'Donell
as the program does not invoke undefined behavior, the ``safe to call=
''
Post by Carlos O'Donell
function behaves as specified, and does not cause other functions to
deviate from their specified behavior. We have chosen to use its loo=
se
Post by Carlos O'Donell
definitions of safety, not because they are the best definitions to u=
se,
Post by Carlos O'Donell
but because choosing them harmonizes this manual with POSIX.
=20
Please keep in mind that these are preliminary definitions and
annotations, and certain aspects of the definitions are still under
discussion and might be subject to clarification or change.
=20
Over time, we envision evolving the preliminary safety notes into sta=
ble
Post by Carlos O'Donell
commitments, as stable as those of our interfaces. As we do, we will
the
Post by Carlos O'Donell
keyword remains, however, they are not to be regarded as a promise of
future behavior.
=20
@end itemize
=20
Other keywords that appear in safety notes are defined in subsequent
sections.
=20
@node Unsafe Features, Conditionally Safe Features, POSIX Safety Conc=
epts, POSIX
Post by Carlos O'Donell
@subsubsection Unsafe Features
@cindex Unsafe Features
=20
Functions that are unsafe to call in certain contexts are annotated w=
ith
Post by Carlos O'Donell
keywords that document their features that make them unsafe to call.
AS-Unsafe features in this section indicate the functions are never s=
afe
Post by Carlos O'Donell
to call when asynchronous signals are enabled. AC-Unsafe features
indicate they are never safe to call when asynchronous cancellation i=
s
Post by Carlos O'Donell
enabled. There are no MT-Unsafe marks in this section.
=20
@itemize @bullet
=20
@item @code{lock}
@cindex lock
=20
interrupted by a signal while holding a non-recursive lock. If the
signal handler calls another such function that takes the same lock, =
the
Post by Carlos O'Donell
result is a deadlock.
=20
cancelled asynchronously, fail to release a lock that would have been
released if their execution had not been interrupted by asynchronous
thread cancellation. Once a lock is left taken, attempts to take tha=
t
Post by Carlos O'Donell
lock will block indefinitely.
=20
@item @code{corrupt}
@cindex corrupt
=20
upt
Post by Carlos O'Donell
data structures and misbehave when they interrupt, or are interrupted
these take recursive locks to avoid MT-Safety problems, but this is n=
ot
Post by Carlos O'Donell
enough to stop a signal handler from observing a partially-updated da=
ta
Post by Carlos O'Donell
structure. Further corruption may arise from the interrupted functio=
n's
Post by Carlos O'Donell
failure to notice updates made by signal handlers.
=20
e
Post by Carlos O'Donell
data structures in a corrupt, partially updated state. Subsequent us=
es
Post by Carlos O'Donell
of the data structure may misbehave.
=20
@c A special case, probably not worth documenting separately, involve=
s
Post by Carlos O'Donell
@c reallocing, or even freeing pointers. Any case involving free cou=
ld
Post by Carlos O'Donell
@c be easily turned into an ac-safe leak by resetting the pointer bef=
ore
Post by Carlos O'Donell
@c releasing it; I don't think we have any case that calls for this s=
ort
Post by Carlos O'Donell
@c of fixing. Fixing the realloc cases would require a new interface=
@c instead of @code{ptr=3Drealloc(ptr,size)} we'd have to introduce
@c @code{acsafe_realloc(&ptr,size)} that would modify ptr before
@c releasing the old memory. The ac-unsafe realloc could be implemen=
ted
Post by Carlos O'Donell
@c in terms of an internal interface with this semantics (say
@c __acsafe_realloc), but since realloc can be overridden, the functi=
on
Post by Carlos O'Donell
@c we call to implement realloc should not be this internal interface=
,
Post by Carlos O'Donell
@c but another internal interface that calls __acsafe_realloc if real=
loc
Post by Carlos O'Donell
@c was not overridden, and calls the overridden realloc with async
@c cancel disabled. --lxoliva
=20
@item @code{heap}
@cindex heap
=20
are
Post by Carlos O'Donell
=20
@sampsafety{@asunsafe{@asulock{}}@acunsafe{@aculock{} @acsfd{} @acsme=
m{}}}
Post by Carlos O'Donell
=20
@c Check for cases that should have used plugin instead of or in
@c addition to this. Then, after rechecking gettext, adjust i18n if
@c needed.
@item @code{dlopen}
@cindex dlopen
=20
shared libraries into the current execution image. This involves
opening files, mapping them into memory, allocating additional memory=
,
Post by Carlos O'Donell
resolving symbols, applying relocations and more, all of this while
holding internal dynamic loader locks.
=20
The locks are enough for these functions to be AS- and AC-Unsafe, but
other issues may arise. At present this is a placeholder for all
=20
@c dlopen runs init and fini sections of the module; does this mean
@c dlopen always implies plugin?
=20
@item @code{plugin}
@cindex plugin
=20
be
Post by Carlos O'Donell
MT-Safe, AS-Unsafe and AC-Unsafe. Examples of such plugins are stack
@cindex NSS
unwinding libraries, name service switch (NSS) and character set
@cindex iconv
conversion (iconv) back-ends.
=20
Although the plugins mentioned as examples are all brought in by mean=
s
se
Post by Carlos O'Donell
module and finds the addresses of some of its functions, while anothe=
r
Post by Carlos O'Donell
just calls those already-resolved functions, the former will be marke=
d
hen
Post by Carlos O'Donell
a single function takes all of these actions, then it gets both marks=
=2E
Post by Carlos O'Donell
=20
@item @code{i18n}
@cindex i18n
=20
ose
Post by Carlos O'Donell
=20
@sampsafety{@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @as=
=20
@item @code{timer}
@cindex timer
=20
similar to set a time-out for a system call or a long-running operati=
on.
Post by Carlos O'Donell
In a multi-threaded program, there is a risk that the time-out signal
will be delivered to a different thread, thus failing to interrupt th=
e
Post by Carlos O'Donell
intended thread. Besides being MT-Unsafe, such functions are always
AS-Unsafe, because calling them in signal handlers may interfere with
timers set in the interrupted code, and AC-Unsafe, because there is n=
o
Post by Carlos O'Donell
safe way to guarantee an earlier timer will be reset in case of
asynchronous cancellation.
=20
@end itemize
=20
@node Conditionally Safe Features, Other Safety Remarks, Unsafe Featu=
res, POSIX
Post by Carlos O'Donell
@subsubsection Conditionally Safe Features
@cindex Conditionally Safe Features
=20
For some features that make functions unsafe to call in certain
contexts, there are known ways to avoid the safety problem other than
refraining from calling the function altogether. The keywords that
follow refer to such features, and each of their definitions indicate
how the whole program needs to be constrained in order to remove the
safety problem indicated by the keyword. Only when all the reasons t=
hat
Post by Carlos O'Donell
make a function unsafe are observed and addressed, by applying the
documented constraints, does the function become safe to call in a
context.
=20
@itemize @bullet
=20
@item @code{init}
@cindex init
=20
MT-Unsafe initialization when they are first called.
=20
Calling such a function at least once in single-threaded mode removes
this specific cause for the function to be regarded as MT-Unsafe. If=
no
Post by Carlos O'Donell
other cause for that remains, the function can then be safely called
after other threads are started.
=20
the
Post by Carlos O'Donell
data structures.
=20
If a signal handler interrupts such an initializer, and calls any
deadlock if the thread library has been loaded.
=20
Furthermore, if an initializer is partially complete before it is
canceled or interrupted by a signal whose handler requires the same
initialization, some or all of the initialization may be performed mo=
re
Post by Carlos O'Donell
than once, leaking resources or even resulting in corrupt internal da=
ta.
Post by Carlos O'Donell
=20
n
Post by Carlos O'Donell
AS- or AC-Unsafe feature should ensure the initialization is performe=
d
Post by Carlos O'Donell
before configuring signal handlers or enabling cancellation, so that =
the
Post by Carlos O'Donell
=20
@c We may have to extend the annotations to cover conditions in which
@c initialization may or may not occur, since an initial call in a sa=
fe
Post by Carlos O'Donell
@c context is no use if the initialization doesn't take place at that
@c time: it doesn't remove the risk for later calls.
=20
@item @code{race}
@cindex race
=20
objects in ways that may cause data races or similar forms of
destructive interference out of concurrent execution. In some cases,
the objects are passed to the functions by users; in others, they are
used by the functions to return values to users; in others, they are =
not
Post by Carlos O'Donell
even exposed to users.
=20
We consider access to objects passed as (indirect) arguments to
functions to be data race free. The assurance of data race free obje=
cts
Post by Carlos O'Donell
is the caller's responsibility. We will not mark a function as
MT-Unsafe or AS-Unsafe if it misbehaves when users fail to take the
measures required by POSIX to avoid data races when dealing with such
objects. As a general rule, if a function is documented as reading f=
rom
Post by Carlos O'Donell
an object passed (by reference) to it, or modifying it, users ought t=
o
Post by Carlos O'Donell
use memory synchronization primitives to avoid data races just as the=
y
Post by Carlos O'Donell
would should they perform the accesses themselves rather than by call=
ing
Post by Carlos O'Donell
general rule, in that POSIX mandates the library to guard against dat=
a
Post by Carlos O'Donell
races in many functions that manipulate objects of this specific opaq=
ue
Post by Carlos O'Donell
type. We regard this as a convenience provided to users, rather than=
as
Post by Carlos O'Donell
a general requirement whose expectations should extend to other types=
=2E
Post by Carlos O'Donell
=20
In order to remind users that guarding certain arguments is their
responsibility, we will annotate functions that take objects of certa=
in
Post by Carlos O'Donell
types as arguments. We draw the line for objects passed by users as
follows: objects whose types are exposed to users, and that users are
expected to access directly, such as memory buffers, strings, and
or
Post by Carlos O'Donell
redundant with the general requirement, and not many would be surpris=
ed
Post by Carlos O'Donell
by the library's lack of internal guards when accessing objects that =
can
Post by Carlos O'Donell
be accessed directly by users.
=20
As for objects that are opaque or opaque-like, in that they are to be
manipulated only by passing them to library functions (e.g.,
@code{FILE}, @code{DIR}, @code{obstack}, @code{iconv_t}), there might=
be
Post by Carlos O'Donell
additional expectations as to internal coordination of access by the
the
Post by Carlos O'Donell
argument name, functions that take such objects but that do not take
care of synchronizing access to them by default. For example,
@code{FILE} stream @code{unlocked} functions will be annotated, but
will not, even though the implicit locking may be disabled on a
per-stream basis.
=20
In either case, we will not regard as MT-Unsafe functions that may
access user-supplied objects in unsafe ways should users fail to ensu=
re
Post by Carlos O'Donell
the accesses are well defined. The notion prevails that users are
expected to safeguard against data races any user-supplied objects th=
at
Post by Carlos O'Donell
the library accesses on their behalf.
=20
@c The above describes @mtsrace; @mtasurace is described below.
=20
This user responsibility does not apply, however, to objects controll=
ed
Post by Carlos O'Donell
by the library itself, such as internal objects and static buffers us=
ed
Post by Carlos O'Donell
to return values from certain calls. When the library doesn't guard
them against concurrent uses, these cases are regarded as MT-Unsafe a=
nd
ted
Post by Carlos O'Donell
as redundant with the one under MT-Unsafe). As in the case of
user-exposed objects, the mark may be followed by a colon and an
identifier. The identifier groups all functions that operate on a
certain unguarded object; users may avoid the MT-Safety issues relate=
d
Post by Carlos O'Donell
with unguarded concurrent access to such internal objects by creating=
a
Post by Carlos O'Donell
non-recursive mutex related with the identifier, and always holding t=
he
Post by Carlos O'Donell
mutex when calling any function marked as racy on that identifier, as
they would have to should the identifier be an object under user
control. The non-recursive mutex avoids the MT-Safety issue, but it
trades one AS-Safety issue for another, so use in asynchronous signal=
s
Post by Carlos O'Donell
remains undefined.
=20
When the identifier relates to a static buffer used to hold return
values, the mutex must be held for as long as the buffer remains in u=
se
Post by Carlos O'Donell
by the caller. Many functions that return pointers to static buffers
offer reentrant variants that store return values in caller-supplied
is
Post by Carlos O'Donell
chosen not by calling an alternate entry point, but by passing a
e
Post by Carlos O'Donell
to be stored. These variants are generally preferable in multi-threa=
ded
Post by Carlos O'Donell
programs, although some of them are not MT-Safe because of other
=20
@item @code{const}
@cindex const
=20
ly
Post by Carlos O'Donell
modify internal objects that are better regarded as constant, because=
a
Post by Carlos O'Donell
writers of internal objects to be regarded as MT-Unsafe and AS-Unsafe=
,
Post by Carlos O'Donell
this mark is applied to writers only. Writers remain equally MT- and
AS-Unsafe to call, but the then-mandatory constness of objects they
modify enables readers to be regarded as MT-Safe and AS-Safe (as long=
as
Post by Carlos O'Donell
no other reasons for them to be unsafe remain), since the lack of
synchronization is not a problem when the objects are effectively
constant.
=20
lf
Post by Carlos O'Donell
as a safety note in readers. Programs that wish to work around this
safety issue, so as to call writers, may use a non-recursve
@code{rwlock} associated with the identifier, and guard @emph{all} ca=
lls
a
ier
Post by Carlos O'Donell
by itself with a read lock. The non-recursive locking removes the
MT-Safety problem, but it trades one AS-Safety problem for another, s=
o
Post by Carlos O'Donell
use in asynchronous signals remains undefined.
=20
@c But what if, instead of marking modifiers with const:id and reader=
s
Post by Carlos O'Donell
@c with just id, we marked writers with race:id and readers with ro:i=
d?
Post by Carlos O'Donell
@c Instead of having to define each instance of =93id=94, we'd have a
@c general pattern governing all such =93id=94s, wherein race:id woul=
d
Post by Carlos O'Donell
@c suggest the need for an exclusive/write lock to make the function
@c safe, whereas ro:id would indicate =93id=94 is expected to be read=
-only,
Post by Carlos O'Donell
@c but if any modifiers are called (while holding an exclusive lock),
@c then ro:id-marked functions ought to be guarded with a read lock f=
or
Post by Carlos O'Donell
@c safe operation. ro:env or ro:locale, for example, seems to convey
@c more clearly the expectations and the meaning, than just env or
@c locale.
=20
@item @code{sig}
@cindex sig
=20
n
Post by Carlos O'Donell
identical AS-Safety issue, omitted for brevity) may temporarily insta=
ll
Post by Carlos O'Donell
a signal handler for internal purposes, which may interfere with othe=
r
Post by Carlos O'Donell
uses of the signal, identified after a colon.
=20
This safety problem can be worked around by ensuring that no other us=
es
Post by Carlos O'Donell
of the signal will take place for the duration of the call. Holding =
a
Post by Carlos O'Donell
non-recursive mutex while calling all functions that use the same
temporary signal; blocking that signal before the call and resetting =
its
Post by Carlos O'Donell
handler afterwards is recommended.
=20
There is no safe way to guarantee the original signal handler is
restored in case of asynchronous cancellation, therefore so-marked
functions are also AC-Unsafe.
=20
@c fixme: at least deferred cancellation should get it right, and wou=
ld
Post by Carlos O'Donell
@c obviate the restoring bit below, and the qualifier above.
=20
Besides the measures recommended to work around the MT- and AS-Safety
problem, in order to avert the cancellation problem, disabling
restore the signal to the desired state and to release the mutex are
recommended.
=20
@item @code{term}
@cindex term
=20
e
r},
dow
Post by Carlos O'Donell
in which changes made by other threads are lost. Thus, functions mar=
ked
by
Post by Carlos O'Donell
asynchronous signals to be lost. These functions are also AS-Unsafe,
but the corresponding mark is omitted as redundant.
=20
It is thus advisable for applications using the terminal to avoid
concurrent and reentrant interactions with it, by not using it in sig=
nal
Post by Carlos O'Donell
handlers or blocking signals that might use it, and holding a lock wh=
ile
Post by Carlos O'Donell
calling these functions and interacting with the terminal. This lock
should also be used for mutual exclusion with functions marked with
@code{@mtasurace{:tcattr(fd)}}, where @var{fd} is a file descriptor f=
or
Post by Carlos O'Donell
the controlling terminal. The caller may use a single mutex for
simplicity, or use one mutex per terminal, even if referenced by
different file descriptors.
=20
to
Post by Carlos O'Donell
restore terminal settings to their original state, after temporarily
changing them, but they may fail to do so if cancelled.
=20
@c fixme: at least deferred cancellation should get it right, and wou=
ld
Post by Carlos O'Donell
@c obviate the restoring bit below, and the qualifier above.
=20
Besides the measures recommended to work around the MT- and AS-Safety
problem, in order to avert the cancellation problem, disabling
restore the terminal settings to the original state and to release th=
e
Post by Carlos O'Donell
mutex are recommended.
=20
@end itemize
=20
@node Other Safety Remarks, , Conditionally Safe Features, POSIX
@subsubsection Other Safety Remarks
@cindex Other Safety Remarks
=20
Additional keywords may be attached to functions, indicating features
that do not make a function unsafe to call, but that may need to be
=20
@itemize @bullet
=20
@item @code{locale}
@cindex locale
=20
m
Post by Carlos O'Donell
the locale object without any form of synchronization. Functions
may
Post by Carlos O'Donell
behave in ways that do not correspond to any of the locales active
during their execution, but an unpredictable mix thereof.
=20
We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the locale object are marked with
@code{const:locale} and regarded as unsafe. Being unsafe, the latter
are not to be called when multiple threads are running or asynchronou=
s
Post by Carlos O'Donell
signals are enabled, and so the locale can be considered effectively
constant in these contexts, which makes the former safe.
=20
@c Should the locking strategy suggested under @code{const} be used,
@c failure to guard locale uses is not as fatal as data races in
@c general: unguarded uses will @emph{not} follow dangling pointers o=
r
Post by Carlos O'Donell
@c access uninitialized, unmapped or recycled memory. Each access wi=
ll
Post by Carlos O'Donell
@c read from a consistent locale object that is or was active at some
@c point during its execution. Without synchronization, however, it
@c cannot even be assumed that, after a change in locale, earlier
@c locales will no longer be used, even after the newly-chosen one is
@c used in the thread. Nevertheless, even though unguarded reads fro=
m
Post by Carlos O'Donell
@c the locale will not violate type safety, functions that access the
@c locale multiple times may invoke all sorts of undefined behavior
@c because of the unexpected locale changes.
=20
@item @code{env}
@cindex env
=20
re
Post by Carlos O'Donell
safety in the presence of concurrent modifications.
=20
We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the environment are all marked with
@code{const:env} and regarded as unsafe. Being unsafe, the latter ar=
e
Post by Carlos O'Donell
not to be called when multiple threads are running or asynchronous
signals are enabled, and so the environment can be considered
effectively constant in these contexts, which makes the former safe.
=20
@item @code{hostid}
@cindex hostid
=20
om
Post by Carlos O'Donell
the system-wide data structures that hold the ``host ID'' of the
machine. These data structures cannot generally be modified atomical=
ly.
Post by Carlos O'Donell
Since it is expected that the ``host ID'' will not normally change, t=
he
th
Post by Carlos O'Donell
@code{@mtasuconst{:@mtshostid{}}}, indicating it may require special
care if it is to be called. In this specific case, the special care
amounts to system-wide (not merely intra-process) coordination.
=20
@item @code{sigintr}
@cindex sigintr
=20
@code{_sigintr} internal data structure without any guards to ensure
safety in the presence of concurrent modifications.
=20
We do not mark these functions as MT- or AS-Unsafe, however, because
functions that modify the this data structure are all marked with
@code{const:sigintr} and regarded as unsafe. Being unsafe, the latte=
r
Post by Carlos O'Donell
are not to be called when multiple threads are running or asynchronou=
s
Post by Carlos O'Donell
signals are enabled, and so the data structure can be considered
effectively constant in these contexts, which makes the former safe.
=20
@item @code{fd}
@cindex fd
=20
e
Post by Carlos O'Donell
descriptors if asynchronous thread cancellation interrupts their
execution.
=20
Functions that allocate or deallocate file descriptors will generally=
be
Post by Carlos O'Donell
marked as such. Even if they attempted to protect the file descripto=
r
Post by Carlos O'Donell
allocation and deallocation with cleanup regions, allocating a new
descriptor and storing its number where the cleanup region could rele=
ase
Post by Carlos O'Donell
it cannot be performed as a single atomic operation. Similarly,
releasing the descriptor and taking it out of the data structure
normally responsible for releasing it cannot be performed atomically.
There will always be a window in which the descriptor cannot be relea=
sed
Post by Carlos O'Donell
because it was not stored in the cleanup handler argument yet, or it =
was
Post by Carlos O'Donell
already taken out before releasing it. It cannot be taken out after
release: an open descriptor could mean either that the descriptor sti=
ll
Post by Carlos O'Donell
has to be closed, or that it already did so but the descriptor was
reallocated by another thread or signal handler.
=20
Such leaks could be internally avoided, with some performance penalty=
,
Post by Carlos O'Donell
by temporarily disabling asynchronous thread cancellation. However,
since callers of allocation or deallocation functions would have to d=
o
Post by Carlos O'Donell
this themselves, to avoid the same sort of leak in their own layer, i=
t
Post by Carlos O'Donell
makes more sense for the library to assume they are taking care of it
than to impose a performance penalty that is redundant when the probl=
em
Post by Carlos O'Donell
is solved in upper layers, and insufficient when it is not.
=20
This remark by itself does not cause a function to be regarded as
AC-Unsafe. However, cumulative effects of such leaks may pose a
problem for some programs. If this is the case, suspending asynchron=
ous
Post by Carlos O'Donell
cancellation for the duration of calls to such functions is recommend=
ed.
Post by Carlos O'Donell
=20
@item @code{mem}
@cindex mem
=20
memory if asynchronous thread cancellation interrupts their execution=
=2E
Post by Carlos O'Donell
=20
The problem is similar to that of file descriptors: there is no atomi=
c
Post by Carlos O'Donell
interface to allocate memory and store its address in the argument to=
a
Post by Carlos O'Donell
cleanup handler, or to release it and remove its address from that
argument, without at least temporarily disabling asynchronous
cancellation, which these functions do not do.
=20
This remark does not by itself cause a function to be regarded as
generally AC-Unsafe. However, cumulative effects of such leaks may b=
e
Post by Carlos O'Donell
severe enough for some programs that disabling asynchronous cancellat=
ion
Post by Carlos O'Donell
for the duration of calls to such functions may be required.
=20
@item @code{cwd}
@cindex cwd
=20
y
Post by Carlos O'Donell
change the current working directory during their execution, which ma=
y
Post by Carlos O'Donell
cause relative pathnames to be resolved in unexpected ways in other
threads or within asynchronous signal or cancellation handlers.
=20
This is not enough of a reason to mark so-marked functions as MT- or
@code{FTW_CHDIR}), avoiding the option may be a good alternative to
system calls.
=20
@item @code{!posix}
@cindex !posix
=20
This remark, as an MT-, AS- or AC-Safety note to a function, indicate=
s
Post by Carlos O'Donell
the safety status of the function is known to differ from the specifi=
ed
Post by Carlos O'Donell
status in the POSIX standard. For example, POSIX does not require a
function to be Safe, but our implementation is, or vice-versa.
=20
For the time being, the absence of this remark does not imply the saf=
ety
Post by Carlos O'Donell
properties we documented are identical to those mandated by POSIX for
the corresponding functions.
=20
@item @code{:identifier}
@cindex :identifier
=20
Annotations may sometimes be followed by identifiers, intended to gro=
up
Post by Carlos O'Donell
several functions that e.g. access the data structures in an unsafe w=
ay,
Post by Carlos O'Donell
information, such as naming a signal in a function marked with
@code{sig}. It is envisioned that it may be applied to @code{lock} a=
nd
Post by Carlos O'Donell
@code{corrupt} as well in the future.
=20
In most cases, the identifier will name a set of functions, but it ma=
y
Post by Carlos O'Donell
name global objects or function arguments, or identifiable properties=
or
Post by Carlos O'Donell
logical components associated with them, with a notation such as
@var{arg}, or @code{:tcattr(fd)} to denote the terminal attributes of=
a
Post by Carlos O'Donell
=20
The most common use for identifiers is to provide logical groups of
functions and arguments that need to be protected by the same
synchronization primitive in order to ensure safe operation in a give=
n
Post by Carlos O'Donell
context.
=20
@item @code{/condition}
@cindex /condition
=20
Some safety annotations may be conditional, in that they only apply i=
f a
Post by Carlos O'Donell
boolean expression involving arguments, global variables or even the
underlying kernel evaluates evaluates to true. Such conditions as
@code{/hurd} or @code{/!linux!bsd} indicate the preceding marker only
applies when the underlying kernel is the HURD, or when it is neither
@code{/one_per_line} indicate the preceding marker only applies when
nonzero.
=20
When all marks that render a function unsafe are adorned with such
conditions, and none of the named conditions hold, then the function =
can
Post by Carlos O'Donell
be regarded as safe.
=20
@end itemize
---
=20
--=20
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Carlos O'Donell
2014-10-17 14:40:38 UTC
Permalink
Post by Michael Kerrisk (man-pages)
Post by Carlos O'Donell
Michael,
I submit the following text to the Linux Kernel Man Pages project.
The goal being that we copy-edit this into a safety attributes
man page and thus harmonize the definition of thread safe,
async-cancel safe, and async-signal safe between glibc and the
linux kernel man page project.
Please feel free to use all, some, or non of this document. It is
included under GPLv2+_DOC_FULL for your use in the linux kernel man
pages project. It is presently formatted as info, please feel free
to reformat. For example the HURD parts of the doucment do not apply
since the man pages are intended for systems using the Linux
kernel e.g. GNU/Linux.
As always I look forward to continued harmonization between the
glibc manual and linux kernel man pages project :-)
Hi Carlos,
I started editing up some of this material for inclusion in
a man page. As well as the Copyright, I do find it useful to
list the authors for the text. Is that you, Alexandre, or both?
Alexandre is the author of the text.

Cheers,
Carlos.

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...