This article is meant for kata authors and translators who would like to create new content in C. It attempts to explain how to create and organize things in a way conforming to authoring guidelines, shows the most common pitfalls and how to avoid them.
This article is not a standalone tutorial on creating kata or translations. It's meant to be a complementary, C-specific part of a more general set of HOWTOs and guidelines related to content authoring. If you are going to create a C translation, or a new C kata from scratch, please make yourself familiar with the aforementioned documents related to authoring in general first.
Any technical information related to the C setup on Codewars can be found on the C reference page (language versions, available libraries, and setup of the code runner).
C code blocks can be inserted with C-specific part in sequential code blocks:
C-specific paragraphs can be inserted with language conditional rendering:
- C is much lower-level than many other popular languages available on Codewars. For this reason, many kata, even if their task can be translated to C directly, can turn out much harder in C than in the original language. There are many kata that were originally created as very easy and beginner-friendly (for example 8 kyu). But after translating into C, and adding aspects like memory management, or two dimensional C arrays, etc. they are not so easy anymore, and newbies complain that kata ranked 8 kyu is too difficult for them while it should be an entry-level task.
- C is statically typed, so any task that depends on dynamic typing can be difficult to translate into C, and attempts of forcing a C kata to reflect a dynamically typed interface can lead to ideas that enforce a really bad design.
- There are just a few additional libraries available for the C runner, so almost everything has to be implemented manually by the author or the user. Kata that take advantage of additional packages installed for other languages become much more difficult in C.
Unlike for example Python or Java, there's no single guide for C code style, or even a set of naming conventions, which would be widely adopted and agreed upon by C programmers. Traditional naming conventions are using
snake_case, Win32 API naming conventions are using
PascalCase, there are GNU guidelines, Microsoft guidelines, Google guidelines, and some of them contradict each other. Just use whatever set of guidelines you like, but when you do, use it consistently.
Not as much of a problem for C as it is for C++, but still, C authors often forget to include required header files, or just leave them out deliberately because "it works" even when some of the files are not included. It happens mostly due to the following reasons:
- The compiler provides an implicit declaration of a function, when it's encountered in the code and was not declared. However, this behavior is not standard and is now deprecated. You need to explicitly include header files for library functions you use or declare them in some other way.
- Some header files include other header files indirectly, for example, file
#include <bar.h>, which might appear to make the explicit include for
bar.hunnecessary. It's not true though, because the file
foo.hmight change one day, or might depend on some compiler settings or command line options, and after some changes to the configuration of the C runner, the
bar.hmight be not included there anymore. That's why every file (i.e. code snippet) of a kata should explicitly include all required header files declaring functions used in it.
- The author might think that header files for the testing framework are included automatically by the code runner. That is not the case though, and test suites need to include
Compiler options related to warnings used by the C runner are somewhat strict and require some discipline to get the code to compile cleanly.
-Wextra may cause numerous warnings and some of them are very pedantic. However, code of C kata should still compile cleanly, without any warnings logged to the console. Even when a warning does not cause any problem with tests, users get distracted by them and blame them for failing tests.
Unlike many modern, high-level languages, C does not manage memory automatically. Manual memory management is a very vast and complex topic, with many possible ways of achieving the goal depending on a specific case, caveats, and pitfalls.
Data hidden behind pointers can be arranged in many possible ways. Whenever a kata passes in a pointer to the user's solution or requires it to return or manipulate a pointer or data referenced by a pointer, it should explicitly and clearly provide all information necessary to carry out the operation correctly. The information can be put in one or more of the following places:
- The code itself. Specifying that a pointer points to
constdata can serve as a hint that it has not been allocated dynamically and won't be freed. Size hints for array parameters can help understanding how arrays are organized, etc. Correctly specified types can be very helpful, but not always sufficient.
- Language-specific paragraph in the kata description.
- As a comment in the "Solution setup" snippet.
- When necessary, sample tests should present an example of how data is composed, passed to the user solution, fetched from it, worked on, and cleaned up afterwards.
When the structure, layout, or allocation scheme of pointed data is not described, users cannot know how to implement requirements without causing either a crash or a memory leak. Authors can choose the ownership strategy their kata should use, and the memory can be managed either by the test suite, by the user, or both. However, they should be aware of the advantages and disadvantages of each such strategy, and when and which applies the best.
Possible ways of handling memory management are described in the Memory Management in C kata article.
One of the consequences of unmanaged memory is that it's strongly recommended against returning string constants from C functions, especially when translating kata from other languages. Returning a string in other languages is not a problem, but in C it always raises questions of who should allocate it and how it should be allocated. Consider replacing the string with some simpler data type (eventually aliased with a
NONE. If the author decides to keep raw C-strings as elements of the kata interface, they should clearly specify the required allocation scheme.
C kata use the Criterion testing framework to implement and execute tests. Read its reference page to find out how to structure tests into groups and test cases, what assertions are available, etc.
Criterion supports many features that can be very helpful, but (unfortunately) are not commonly used by C authors. It allows for parameterized tests, setting up additional data, test fixture setup, teardown, custom descriptions, etc.
You should notice that the report hooks used by the Codewars test runner produce one test output entry per assertion, so the test output panel can get very noisy.
Unlike some other languages, C does not provide too many means of generating random numbers which could be used to build random tests.
stdlib.h header provides the
rand function which, while being quite simple, satisfies the majority of needs, but sometimes can be tricky to be used correctly.
rand is called for the first time, it must be seeded with
srand. A call to
srand should be performed only once, in the setup phase of the random tests.
srand usually uses the current time as a seed, so authors need to include
time.h before using the
rand can return integers only up to
RAND_MAX. There's no standard-compliant way to generate random values of types
double. Authors who would like to generate random values out of the domain of
rand have to craft them manually. (TODO: create article with snippets with RNGs for types other than
Additionally, the value of
RAND_MAX might differ on different platforms, or even change. For the current Codewars setup it's
2^31-1, but there are some common platforms with
RAND_MAX being as small as
2^15-1. This makes the code using
rand even less portable, and while portability might not be a big concern for Codewars kata, it could turn out to be an issue for users trying to reproduce random tests locally.
An alternative to
rand could be using random devices, like
/dev/urandom. This way of generating random numbers could partially alleviate the issue of the
rand being capped at
RAND_MAX, but also could inflate the amount of the boilerplate code and could cause additional problems with portability.
If the test suite happens to use a reference solution to calculate expected values (which should be avoided when possible), or some kind of reference data like precalculated arrays, etc., it must not be possible for the user to call it, redefine, overwrite or directly access its contents. To prevent this, it should be defined as
static in the tests implementation file.
The reference solution or data must not be defined in the Preloaded code.
General guidelines for submission tests contain a section related to input mutation and how to prevent users from abusing it to work around kata requirements. Since C does not have reference semantics, it might appear that C kata are not affected by this problem, but it's not completely true. While data is passed to the user solution by value, it indeed cannot be easily modified by the user solution. However, when data is passed indirectly, by a pointer, or as an array, it can be modified even when it's marked as
const. Constness of a function argument can be forcefully cast away by a user and then they would be able to modify values passed as
const T* or as elements of
const T. It's usually not a problem in "real world" C programming, but on Codewars, users can take advantage of vulnerable test suites and modify their behavior this way. After calling a user solution, tests should not rely on the state of such values and they should consider them as potentially modified by a user.
Criterion provides a set of useful assertions, but when used incorrectly, they can cause a series of problems:
- Stacktraces of a crashing user solution can reveal details that should not be visible,
- Use of an assertion not suitable for the given case may lead to incorrect test results,
- Incorrectly used assertions may produce confusing or unhelpful messages.
To avoid the above problems, calls to assertion functions should respect the following rules:
- The expected value should be calculated before invoking an assertion. The
expectedparameter passed to the assertion should not be a function call expression, but a value calculated directly beforehand.
- Appropriate assertion functions should be used for each given test.
cr_assert_eqis not suitable in all situations. Use
cr_assert_float_eqfor floating point comparisons,
cr_assertfor tests on boolean values,
cr_assert_str_*to test strings and
cr_assert_arr_*to test arrays.
- Some additional attention should be paid to the order of parameters passed to assertion macros. It differs between various assertion libraries, and it happens to be quite often confused by authors, mixing up
expectedin assertion messages. For the C testing framework, the order is
- To avoid unexpected crashes in tests, it's recommended to perform some additional assertions before assuming that the answer returned by the user solution has some particular form, or size. For example, if the solution returns a pointer (possibly pointing to an array), an explicit assertion should be added to check whether the returned pointer is valid, and not, for example,
NULL; the size of the returned array, potentially reported by an output parameter, should be verified before accessing an element which could turn out to be located outside of its bounds.
- Default messages produced by assertion macros are confusing, so authors should provide custom messages for failed assertions.
In C, not everything can be easily tested. It's not possible to reliably verify the size or bounds of a returned buffer, or the validity of a returned pointer. It's difficult to test for conditions which result in a crash or undefined behavior. It cannot be reliably verified whether there's no memory leaks and if all allocated memory were correctly released. Sometimes the only way is to skip some checks or crash the tests.
As C is a quite low-level language, it often requires some boilerplate code to implement non-trivial tests, checks, and assertions. It can be tempting to put some code that would be common to sample tests and submission tests in the Preloaded snippet, but this approach sometimes proves to be problematic (see here why), and can cause some headaches for users who are interested in training on the kata locally, or checking how the user solution is called, etc. It's strongly discouraged to use preloaded code to make the code common for test snippets if it would hide some information or implementation details interesting to the user.
Below you can find an example test suite that covers most of the common scenarios mentioned in this article. Note that it does not present all possible techniques, so actual test suites can use a different structure, as long as they keep to established conventions and do not violate authoring guidelines.