Julien Jorge's Personal Website

So You Asked for Feedback: stevensStringLib

Sun Dec 31, 2023

On December 21st, 2023, u/bucephalus posted in Reddit’s r/cpp’s show and tell thread to tell us about his stevensStringLib, described on the GitHub page as “An easy-to-use library of C++ string functions”. This post is my attempt to provide constructive criticism for this project.

Foreword

First of all I think I should give some context about why I am doing this. It is not the first public review I do and there is still this idea of reading code from others to improve my own skills, but there is also an intent to help people to improve their own skills. Since the library under attention was shared on the show and tell thread from Reddit’s r/cpp, I assume that its author welcomes feedback.

As any programmer you know, I did my own share of bad code (like everything I wrote until today but, for sure, next time I will write perfect code for all circumstances…) and if there is one thing that triggered a level-up on my side, every time, it was confronting my code to other programmers. Not necessarily good coders, for whatever definitions of good, but anyone that could tell how they would feel if they had to use my lib. This post is an attempt to give back by providing constructive comments to my pairs.

Another observation is that along time I tend to follow and listen to some kind of elite, to people who go to conferences to show amazing things. By doing so I feel like I am drifting away from regular, day-to-day programmers. This post is an attempt to confront myself to non-over-the-top programming.

First impression

As I look into the repository, a couple of things catch my attention:

  • We have a testing folder, so there is at least an attempt to validate the library. Good!
  • There is a license file and the license is MIT. It’s nice, I need to know if and how I can reuse a library.
  • There is a readme, which contains a short demo and an how-to-install section. This is great.
  • There is a docs folder, which suggests an attempt to document the library. There is also a Doxyfile, so I’m afraid that the documentation may be nothing more than the files generated by doxygen.
  • I see a file named stevensStringLib.h This is the single source file in the repository outside the tests. Are we in the trend of the single-header, header-only, libraries?

At this point I can tell that there is a couple of missing things, the first one is a way to build the project. There is a source file in the testing/ folder which uses the main header, so there must be a way to build the test program. The fact that there is none forces me to install the header somewhere, to hand-build the program, to guess the C++ standard in use, and to deal with the dependencies. Not cool..

I can also see that there is no formatting instructions (e.g. a .clang-format file). Consequently I suppose that the code is either not formatted or that it is formatted by hand/via the author’s IDE. This is unfortunate as the former case tends to show a lack of attention to details and the latter case means that I will have to guess the formatting if I want to contribute. Not cool.

Finally, this is a library focused on algorithms for strings, and I see no benchmark. It is not very reassuring.

That being said, let’s look at the tests.

The tests

The testing folder contains a program, in binary form, named “test”, a file test.cpp, and a folder test_string_files. I guess that the program is obtained by compiling the source file, but why is it in the repository? Generated programs should be excluded from the repository.

The source file begins with a comment describing how to build the program. Moreover the instructions inform the reader that the GoogleTest library must be installed, but it does not tell which version should be installed. This should not be in a comment in the source file, it should be in a recipe from the build scripts.

Let’s see what happens if I try to build the program using the command shown in the comment:

$ g++ -std=c++23 ./testing/test.cpp -lgtest -o test
In file included from ./testing/test.cpp:9:
./testing/../stevensStringLib.h: In function ‘std::string stevensStringLib::toUpper(std::string&)’:
./testing/../stevensStringLib.h:177:16: error: ‘input’ was not declared in this scope; did you mean ‘int’?
  177 |         return input;
      |                ^~~~~
      |                int

There’s test code but it seems that changes in the library are pushed without running the tests ;) This indirectly informs us that there is no CI for this project. Not cool.

After fixing the error the program compiles correctly. Then I run it:

$ ./test
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Error, could not file testing_string_frankenstein.txt
Abandon

It seems that the test cannot be executed from anywhere. After looking at the code I can see that there is no way to pass the directory for the sample files to the program. Not cool. If the author had used a build system he would have had to either provide a way to pass the test folder to the program, or at the very least he would have been able to define a test target which would have run the program from the expected working directory.

Finally, I switch to the testing directory before launching the tests:

$ cd testing && ../test
# …
[----------] Global test environment tear-down
[==========] 94 tests from 23 test suites ran. (44 ms total)
[  PASSED  ] 89 tests.
[  FAILED  ] 5 tests, listed below:
[  FAILED  ] isFloat.too_precise_of_number
[  FAILED  ] countFileLines.load_empty_file
[  FAILED  ] wrapToWidth.wrap_to_width_3
[  FAILED  ] wrapToWidth.wrap_to_width_0
[  FAILED  ] wrapToWidth.wrap_to_width_5

Ouch! 6 failing tests, it does not provide a good impression of the project.

I won’t dig into the detail of the source of the test program but I would like to point one specific thing: There are many instances of this pattern in the test program:

bool result = some_function();
ASSERT_TRUE(result);

When the assert fails, the output of the program looks like this:

[ RUN      ] isFloat.too_precise_of_number
./testing/test.cpp:348: Failure
Value of: result
  Actual: true
Expected: false

This message provides no useful information about the expression that failed. Now, if the author had put the function call in the assert:

ASSERT_FALSE(isFloat(string));

Then the output would have been more informative:

[ RUN      ] isFloat.too_precise_of_number
./testing/test.cpp:348: Failure
Value of: isFloat(string)
  Actual: true
Expected: false

Here we can see the call of the function that provided the unexpected result. Even better, we can get the value of the string:

ASSERT_FALSE(isFloat(string)) << "string=" << string;

Displays:

[ RUN      ] isFloat.too_precise_of_number
./testing/test.cpp:348: Failure
Value of: isFloat(string)
  Actual: true
Expected: false
string=.12341231231231231231231231231231233123123123123123123

Now we have the name of the test, the file, the line, the function, and the parameters. Context goes a long way into helping to understand failing tests.

Last point about the test program: the comment suggests to use C++23 but the program compiles perfectly with C++17. If there is no need for the most recent features, the compilation should be done with the minimum required version of the language.

The library

First of all, you should know that there are many great libraries providing efficient string algorithms, but for the sake of the exercise I will suppose that the author has good reasons to implement them. For the same reason I will not discuss too much about performance except for the most obvious missed opportunities.

The library fits in a single file of barely 900 lines and the functions are short, so it’s quite easy to grasp. All the code is in the header, all functions have external linkage (there’s no inline nor static). This implies that the library cannot be used as is: including the header in two compilation units would lead to an ODR violation. The author should have split the function declarations and definitions in a header and implementation file, respectively.

Since the file is quite short I have integrated review comments directly into it. You can find them all in my fork by searching for the text JJO:. The main observations can be summarized as follows.

There are many uses of std::stringstream in the library, copying strings in and out. This is something I expect not to see in a library dedicated to algorithms on strings. If the author targets C++23 he should use std::ispanstream, but even then the calls to std::getline make me worried.

There are many string copies, especially via function arguments passed by value when they should have been const references. Interestingly, fundamental types are sometimes passed as const references instead of by value.

There are many pessimized implementations, for example searching a single character string in a larger string instead of searching for a character directly. There are also many occurrences of multiple scans of the same string when a single scan would have been enough.

There is literally useless code, like checking that a string is convertible to an integer before calling std::from_chars on it. The latter already does the test and will inform the caller if the conversion could not be done. Also, initializing variables with the same value that would have been set by the default constructor, this is superfluous and makes the code harder to read.

Conclusion

There were many things to say on this library. I did not dig into the documentation but as I supposed before it is just the Doxygen documentation of the code. Note that all functions have a human-friendly description, which is nice.

To summarize what we learned today:

  • Test your code before publishing.
  • Use descriptive assertions in your tests.
  • Target the minimum required version of the language.
  • Don’t expect your program to be executed from a specific directory.
  • Consider implementing benchmarks for your algorithms.
  • Pay attention to the copies, especially in the argument lists.
  • Test for limits. If you intentionally want to ignore unexpected cases (e.g. a negative index in a string), at least put an assertion on it.