Chapter 9: Strings

Of all the structures in the C programming language, strings are perhaps the most paradoxical. They are extremely useful, yet their use can lead to of most convoluted code in C. They are necessary for writing programs, but using them can be extremely annoying. They are conceptually easy and practically difficult.

This chapter will work through the ease and the difficulty that strings represent. We will start off with the easy part: the idea and usefulness of strings. And we will end with the difficult part: the messy code that using strings can generate.

The Basics of Strings

Strings are simply sequences of characters. The word "sequence" should remind you that arrays are sequences of identically typed elements. Therefore, we can think of strings as arrays of characters.

String literals are sequences of characters, surrounded by double quotes. For example:

"Hello, World!"
"Nice to see you."
"Tea.  Earl Grey.  Hot!"

These are all string literals. In C, unlike some languages, the quotes are not interchangeable. Single quotes are used to depict single character literals, not strings. Strings require double quotes.

Strings are Arrays and Pointers

As we stated above, we can think of strings as arrays of characters. In fact, there is no "string" data type in C; we declare strings exactly as we would declare an array of characters. For example,

char title[40];

This declares an array that contains 40 characters, which can also be treated as a string with 40 characters.

Since strings are arrays, we can initialize strings the same way we initialize strings. For example,

char title[40] = 
    { 'E', 'n', 'c', 'o', 'u', 'n', 't', 'e', 'r', ' ', 
      'a', 't', ' ', 'F', 'a', 'r', 'p', 'o', 'i', 'n', 't' };  
char registry[10] = { 'N', 'C', 'C', '-', '1', '7', '0', '1' };

This will initialize the character array title to contain the string Encounter at Farpoint and the character array registry to contain the string NCC-1701. Note that, in the last declaration, numbers can be characters (here, for example, the character '1' has the integer value 49, not actually 1).

While this method of initializing works, it's pretty tedious. C also allows a more convenient initialization of strings:

char title[40] = "Encounter at Farpoint";

From Chapter 8, we know that array and pointer notation is are interchangeable. So strings can also be manipulated by pointer notation as well as by array notation. We can do the following:

char quote[40] = "Make it ";
char *select = quote;

select += 8;
*select = 's';
*(select + 1) = 'o';
*(select + 2) = '!';
*(select + 3) = '\0';
select = quote;

It should be clear that we can manipulate strings using array and pointer notation in ways to which we are, by now, accustomed. Note as well the last line, which places a null character marker at the end of the copied string. We will examine such terminators in the next section.

One warning needs to be made about string assignment. We can initialize strings, but we cannot assign strings through the assignment operator. Consider this code:

char quote[40] = "Make it so";
quote = "Engage!";

The first line works because it is an initialization within a declaration. The second line is an error, because we cannot assign arrays to each other. To make assignment of strings work, you must copy one string to another character-by-character. There are functions that we can use for this; see the "String Functions" section below for description of string copy functions we can use to interact with strings.

Finally, remember from Chapter 7 that we can use unsized arrays. With strings, that makes sense because the compiler can figure out the size of the array from the initialization. So, we can see this in this example:

char another[] = "Engage!";

Here, the another array would be implicitly declared to be 8 characters long (to include the null terminator symbol, see the next section).

Strings are Null Terminated

If we are going to work with a string, we are going to have to know the length of a string. In C, even though we use arrays of a fixed length for strings, the length of the array represents a maximum length. The string stored in an array may be shorter than the full length of the array. We can't actually encode the length of a string into the string itself, so we place a marker at the end of a string. By knowing what the marker looks like, we can count the characters in a string and compute its length.

In C, strings are terminated by a character whose integer value is 0, called the "null" character.

This is a judicious choice for a termination character. Consider how array initializations are made. When we initialize a string with a value shorter then the full size of an array, like with title in the example above, C semantics dictate that the remainder of the array is initialized to the value 0. This is convenient, and makes all the string examples we have given automatically terminated with a null character. Even the pointer manipulation example produces a correct string because the part of the string array not initialized or changed is a collection of 0 values, terminating whatever string is created.

This choice for a termination character also makes certain computations about strings very easy. Consider how we could compute the size of an array:

int size = 0;
while (!title[size]) {
    size++;
}

Eventually, title[size] will have the value 0 when the computation is at the end of the string. With pointers, we can abbreviate this code to

int size = 0;
for (select = title; *select; size++,select++);

This is a computation done by the strlen function to compute the length of strings; this function will be looked at in the "String Functions" section below.

Using a null character for termination also means that we have to be careful when manipulating the array of characters that makes up a string. For example, we can truncate a string easily, almost by accident, this way:

char title[40] = "Encounter at Farpoint";
title[9] = '\0';

Now the string contained in the array title has the value "Encounter" because it is terminated by a value of 0 in the character position after the "r". There are actually 2 strings now in the title array, each terminated with a 0 value.

We also have to be careful about filling the array up to capacity; we have to remember that the last character must have the value 0. This mean, for example, that a string stored in an array of size 40 can only be maximally 39 characters long.

Strings are Not Objects

It's important to emphasize here that strings in C are just character arrays. In other languages, they are specialized datatypes or data objects. In Java, for instance, a "String" is a built-in class and, once assigned, you can work with objects from that class in many built-in ways. Length is determined by calling a function built into the class; various functions are also built-in: concatenation, reverse, up- and lower-casing.

In C, all we have is a character array, with the string terminated with a null character. We have to compute the size and do operations like concatenation and reverse by manipulating the characters in the actual array. The next section details functions that do this computation and manipulation.

String Functions

C provides a number of functions that work with strings. They are provided in a library of standard C functions that can be used from any program.

One of the most common functions is a string copy. Let's say you have two strings, declared as below:

char quote1[60] = "I will take your word for it. This is very amusing.";
char quote2[60];

As we have noted before, simply assigning one array to the other will not copy the contents, but, instead, make both arrays work with the same data. To make a copy, we must copy each individual character from one array to the other. Again, the choice of a 0 value was a judicious choice to terminate a string, because we can have code like this:

int i = 0;
while (quote1[i] != '\0') {
    quote2[i] = quote1[i];
    i++;
}

This is rather clunky, but it copies each character from quote1 to quote2 until the code reaches the terminator. However, the terminator is not copied, so this is not correct. We can condense this code, and make it correct, in the following way:

char *ptr = quote2;
while(*quote2++=*quote1++);
quote2 = ptr;

Here, we use pointer dereferencing and pointer arithmetic to copy the strings. It's more cryptic than you might normally use and it uses operations we have advised against before, but it works.

We could have just used the strcpy function. The prototype for this function looks like:

void strcpy(char *dest, char *src);

And we could have used it with our example as follows:

strcpy(quote2, quote1);

That looks a little simpler to use. Note that strcpy works with string literals as well. Outside of initialization in declaration, we cannot use assignment to set up strings. We need something like this:

strcpy(quote2, "Tell him he is a pretty cat.");

There are several other common functions that are very useful.

  • size_t strlen(char *string);
    This function counts the number of characters in a string, without the terminator, thus returning its length. strlen(quote1) returns 51. Note that the "size_t" data type is equivalent to an unsigned integer data type (which, in this case, is more accurate, because lengths cannot be negative).
  • char *strcat(char *string1, char *string2);
    This function returns a new string that contains the contents of string1 concatenated with the contents of string2. Consider this example:

    char day1[40] = "I would be chasing an untamed ";
    char day2[20] = "ornithoid without cause.";
    char *days = strcat(day1, day2);
    

    This would give days the value "I would be chasing an untamed ornithoid without cause.", describing a wild goose chase.

  • int strcmp(char *string1, char *string2)
    This function compares string1 to string2 lexicographically, that is, alphabetically. and returns an integer: -1 if string1 alphabetically precedes string2; 0 if string1 is equal to string2; and 1 if string1 alphabetically follows string2.

There are "n" versions of these functions: strncpy, strncat, and strncmp. These "n" versions take an extra integer as the last parameter and work for a maximum of the number of characters in the value of this parameter. For example,

  char reg1[10] = "1701 C";
  char reg2[20] = "1701 E";
  int cmp = strncmp(reg1, reg2, 4);

In this code, cmp would have value 0, because the first 4 characters of each string are the same. Note that if the length of each string's contents are less than the length given in the function, the functions work like you would expect. Calling strncmp(reg1, reg2, 15) would have returned -1.

Safe String Functions

The "n" versions of string function are the safe versions of string functions. In fact, since you have a choice of which string functions to use, you should always use these "n" versions.

This is especially true when you are working with strings of unequal length. Copying a string of, say, length 20 into a string of length 10 will overflow the buffer of the destination string, likely causing bad program consequences and not copying the terminator symbol. Using strncpy allows the programmer to specify the length of destination string as the number of characters to copy, thus safely copying characters and correctly terminating the destination.

Unsafe functions are included in the Pebble C library for completeness, but you are highly encouraged to use the safer "n" versions of string functions.

There are several other functions in the C string library that work with strings. Here is a good listing link to check them out.. While this is a reference to C++, the list of C functions is identical.

Common Pitfalls with Strings

Because strings are closely related to both arrays and pointers, we have to be careful when handling them. There are many ways to fall down when using strings. Here are a few bits to guide your string handling.

  • Beware accidentally handling the null character. Placing a 0 value in the middle of a string will truncate it.
  • Because strings are character arrays, you have to be careful with boundaries. When you work with string values rather than individual characters and indexes, it's easy to make out of bounds references. For example:

     char data[40] = "The need for more research is clearly indicated.";
    

    This reference will overflow the boundaries of the data array, because the string is longer than 40 characters. However, because of the way C works with out-of-bounds references, it is not defined how this overflow will affect program code and/or other variables. Note that this is also a great place to use unsized array declarations: declaring char data[] ... would permit us not to bother counting characters.

  • Be aware that using strncpy to copy fixed numbers of characters may not work as expected. If the destination string is not as long as the source string, this function will fill the destination, but will not terminate the string with the null character terminator. If you are routinely working with strings of differing lengths, use strncpy, but explicitly assign the last character of the destination with the NULL terminator.
  • Be aware that calling string functions typically analyzes each string array for every call. This means that code like this: for (int i=0; i<strlen(data); i++) do_something_with(data[i]); actually processes the entire data string once for every character reference. However, this code: int i, len = strlen(data); for (i=0; i<len; i++) do_something_with(data[i]); analyzes the data string once, then processes each character of the string. For long strings, the second version has significant performance improvements.

Project Exercises

Project 9.1

The starter code for Project 9.1 is here. Copy the project and run the code. The starter code displays a cursor that is positioned underneath letters in a word. The "select" button will move between letters: a press will move right and a long press will move left. In the starter project, the letters spell out "Hello World".

You are to fill in code for up_handler and down_handler, code that handles presses of the "up" and "down" buttons, respectively. Each up press should advance the letter the cursor is on to the next letter of the alphabet; each down press should move the letter to the previous letter in the alphabet. In either case, the string must be rebuilt and displayed again on the screen.

Note that you could simply redisplay each letter as it is changed. But that is not good enough for this project. Each string needs to be rebuilt using string functions and redisplayed on the Pebble screen.

An answer to this project can be found here.

Note that this project is coded with a monospace font, courtesy of 1001 Fonts. You can get that font at http://www.1001fonts.com/source-code-pro-font.html.

Project 9.2

This project creates a "word calculator". The starter code, which can be found here, uses the buttons on a Pebble smartwatch to cycle numbers and operators. The "select" button moves to the next position. As the numbers in the calculation change, you are to display the words associated with all the numbers. An example is given in figure below.


Figure 9.1: An example word calculator.

You are to write a function num2words that will take an integer parameter and produce a string that expresses the value in words. For example, if you call the function like this

char *words = num2words(52);

then words will point to the value "fifty two". Set the maximum value sent to the function to 100.

You are also to use this function to replace the numbers that a user puts on the Pebble screen with words. As the calculated numbers change, erase the string that was just added and replace it with the words from the function. Then redisplay the current computation "sentence".

You will have a few issues here to work out. How will you wrap your "sentence" around the Pebble screen? Which operators will you allow? And what happens when the "sentence" is too long for the screen?

You can find an answer here. Note that this answer uses snprintf statements to construct the string in words2num. This is a template-driven solution rather than a copy-based solution, but it's just as valid and even a bit easier to understand.

Extra Challenge: Extend your words substitution to operators. Replace operators, like "+", with words on the screen (e.g., "plus").

Extra Extra Challenge: Don't even use numbers. Cycle through words that represent numbers with a user uses the "up" and "down" buttons.

Project 9.3

A madlib is a word game where you choose random words and insert them in a sentance, filling in the blanks in the sentence to make funny new sentences. This project gets you to create madlibs.

Get the starter code for this project here. Examine the code; it uses three files to read madlibs, nouns, and verbs. Each madlib looks like this: " drove to the and it." You get random madlibs, nouns, and verbs using the functions random_madlib, random_noun, and random_verb, respectively.

You are to generate a random madlib, generate 2 nouns and a verb, then replace the occurences of "" with the first noun, "" with the second noun, and "" with the verb. You are to then display the resulting madlib. Write a function with the header

char *replace_words(char *sentence, char *original, char *replacement)

This function should replace occurences of original with replacement in sentence and return the sentence as the return value of the function.

Once you have your madlib, you may display it on the Pebble screen using the function display_madlib.

This can be a little convoluted, so make sure you insert comments to explain your logic. Also claim the code with your name and an explanation of what it does.

You can find an answer to this Project here.

Extra Challenge #1: Write replace_words so that the function does replacement in place, without a second string.

Extra Challenge #2: Write replace_words so that it uses the C function strchr. This function has the header

char *strchr(const char *s, char c)

It returns a pointer to the first occurence of the character c in the string s. Look for the first character of the original string and you will need to verify that the rest of the string is present.

Project 9.4

Get the starter code for this project here. Read through the code, paying attention to the functions defined.

Among the functions in the starter code that run the watch app, there are three functions that tap into the sensors on the watch. get_compass gets information from the compass sensor. get_accelerometer gets data from the accelerometer. get_light gets light level data. Each function returns a "buffer" that has been filled by a callback function, called when the respective sensor updates. These callback functions hav code to get the data from its respective sensor.

You are to fill in each of the three functions to convert the data derived from the sensor to a string.

  • compass() will put a string into compass_buffer from the struct data. Use data.true_heading as the integer that gives the heading.
  • accel() will put a string into accel_buffer that will depict the accelleration in three directions. Use the struct data in data[0]: .x, .y, and .z. Form a string that can interpret these and can be displayed.
  • light() will put a string into light_buffer based on the ambient light level. Use values of level in an if/then/else or a case statement to set a string depicting the light level.

Remember that strings can be depicted as arrays or pointers. In each function, you need to dynamically allocate a string using malloc and return that as the char * return type from the function.

You can find a solution for this project here. Like 9.2, this solution was done using snprintf statements.

results matching ""

    No results matching ""