Chapter 16: Standard C File I/O

This chapter takes a look at one of the most difficult features to include in a programming language: file input and output. File I/O is such a problem because it is highly dependent on the operating system of the computer that is executing your program. The C language is supposed to remain the same across operating systems, but there is a wide variety in the way different operating systems implement file handling.

To accommodate this, C takes file I/O out of the language. That is to say that there is no syntax that drives file I/O. Rather, it is handled in C by functions, implemented in the standard C library that accompanies implementations on different operating systems. This makes sense because functions can hide the actual implementation of file I/O in each operating system while providing the programmer with a consistent interface.

We will examine file I/O functions in C in this chapter. We will overview basic, general file I/O here and at file I/O on a Pebble smartwatch in the next chapter.

A Little Perspective

File I/O in C owes much to its history. The way file I/O is done is heavily based on the roots of the C language in Unix, the operating system on which C first ran. Unix and C were developed concurrently, and the way file I/O is implemented is an artifact of this concurrent development.

To reiterate, there are no native, syntax-driven components to the C programming language that are connected to file I/O. Instead, all file I/O is done through function calls. The implementation of these functions are part of a standard C library, called "stdio"; this library is linked in when C programs are compiled and the resulting executable is generated. This C library is standardized across operating systems; it is based on a specific model of file contents that is consistent in all implementations. This model is a Unix model, but is implemented for all operating systems on which C runs.

Finally, note that file I/O needs a file system. Some of the features we will discuss are connected to "standard input" and "standard output", i.e., the keyboard and the screen. But other features are based on nonvolatile storage that exists in a file system on the computer on which the C code executable is running. On a Pebble smartwatch, this can be an issue. While we can pinpoint a "screen" or "output", there is no keyboard input to a Pebble smartwatch and there is no file system. This means that, while Pebble's implementation of file I/O includes screen output, most other standard file I/O operations are not supported. There are file I/O operations for Pebble programs; they are discussed in the next chapter.

A General Overview

In general, C considers a file to be a container of bytes. This container has no specific structure other than that the bytes are in a sequential order. This is different than some other file implementations; some operating systems have predefined structures for files. But C is modeled after the Unix view of file systems, which is very simple and abstract.

In order to use a file, the file must first be opened. Once opened, the file contents can be read from the file or written to the file in a sequential manner. When a program is done with a file, that file should be closed. Files are maintained with the concept of a positional pointer, that is, a specific position within the sequential bytes that make up a file. When a read or a write operation is performed, that operation happens at the location of the positional pointer and that pointer is moved to the next byte. That position can be changed by an operation called seeking, which is the act of explicitly moving the positional pointer to a specific place in the file.

C and Random Access Files

Often, in other operating systems and other languages, there is special treatment given to random access files, or special files where the positional pointer can be changed at any time. In keeping with simplicity and abstractness, C does not have special treatment for random access files. Indeed, every file is a random access file and functions that adjust the positional pointer work on all files.

C programs view file contents in two different ways. One way is as a container of text: all data in a file is considered to be 8-bit or 16-bit characters (depending on the character encoding scheme used). The other way is as a container of raw binary data: all file data is a set of bytes with no particular interpretation connected to them. There are sets of functions that apply to each of these categories. If a file has text, then there are likely concepts that build on text, like words and lines, that file functions can work with. If a file has raw binary data, then the only assumption that can be made is it is made up of bytes, and file functions work with that assumption.

Opening and Closing a File

Before the contents of a file can be used, the file must be opened. We open files with the function fopen(). When our use of a file is completed, files should be closed. We close files with the function fclose().

The fopen() function has the following prototype:

FILE *fopen( const char *filename, const char *mode );

The function takes two parameters: filename, which is the name of the file to open, and mode, which is a string of characters that depict the way the file will be accessed. Both parameters are const char* parameters, so literal strings can be used as actual parameters. The mode specifier can be a combination of the following characters:

"r" specifies the file is to be used for reading
"w" specifies the file is to be used for writing, starting with an empty file
"a" means the file contents will be appended: used for writing but without emptying the file first
"b" specifies a binary (raw, byte-wise) file rather than a text file
"+" specifies that both reading and a form of writing (on an empty file or appending). When used as "r+", the file must exist first.

Consider some examples. "rb" would indicate opening a file for reading bytes. "w" would open an empty text file for writing. "r+b" will open an empty binary file for reading and writing.

Note that the function returns a value of the datatype FILE. This is a struct, called a file descriptor, that describes the file being opened. The value returned must be used in the subsequent function calls that pertain to manipulating the file.

For example, let's say we wanted to write some text to a file. We might do it this way:

const char *filename="threebears.txt";
const char *mytext="Once upon a time there were three bears.";
FILE *storyfd = fopen(filename, "wb") ;   
if (storyfd) {
    fwrite(mytext, sizeof(char), strlen(mytext), storyfd);
    fclose(storyfd);
}

This example opens a file using the mode "wb", which inidicates the file is used for writing raw bytes. Note that if there was an error with opening the file, the storyfd descriptor would have a NULL value; this is checked in the example by the "if" statement.

As demonstrated by the example, the fclose() function has the following prototype:

int fclose(FILE *descriptor);

This function closes the file referenced by the descriptor, freeing up all system resources allocated to that file and rendering the descriptor invalid.

You can change the way a file is handled by using the freopen() function. It has the prototype below:

FILE *freopen(const char *filename, const char *mode, FILE *fd);

This reassociates the filename given and the file mode with the descriptor fd. This function works like an fclose() followed by an freopen().

Direct, Non-specific File I/O

Reading and writing raw byte-wise data from and to a file is the most generic way to work with files. Both text and binary data can be written with this method.

The function fread() will read bytes from an opened file. It has the prototype below:

size_t fread(void *buffer, size_t size, size_t num, FILE *fd);

This function needs space in memory (buffer), the size of a single unit in that space (size), the number of units in that space (num), and the file descriptor of the file to read from (fd). The file needs to have been previously opened. The number of bytes to read is computed and a read is attempted from the current file position. The function returns the number of bytes read.

The fwrite() function works much the same way, except it (naturally) writes instead of reads. It has the following prototype:

size_t fwrite(const void *buffer, size_t size, size_t num, FILE *fd);

The parameters match those of fread(). Again, the file needs to have been previously opened.

Consider the example above. The fwrite() call looked like this:

fwrite(mytext, sizeof(char), strlen(mytext), storyfd);

mytext is a pointer to a string. Each unit in the string is a character, whose size is sizeof(char), and the string has strlen(mytext), or the string's length, number of characters/units. Finally, the storyfd is the file descriptor that was returned by a previous fopen() call.

Finally, the feof() function can be used with binary data files to test if the file position pointer is at the end of the file. The prototype for this function is:

int feof(FILE *fd);

It returns a boolean-style indicator if the file position pointer for the file described by the file descriptor is at the end of the file. Note that the end of a file is not detected until a program tries to read past the end of file. This means that the last valid read of file data will not detect the end of file.

Detecting The End of a File

Detecting the end of a file can be a tricky procedure. The operating system maintains an "end of file" flag for each open file that indicates if the end of the file has been reached. However, as we stated above, the end of file is not reached until a file read has gone past the end of the file. This means that this code is not correct:

   fd = fopen(myfilename, "r");
   while (!feof(fd)) {
       // get a character from the file
       c = getc(f);
       // do something with it
       putchar(c);
   }

The code to get the character is fine. If it reads past the end of file, it will return a special character, designated with the constant EOF. But this character is not a valid, printable character, so you can't do anything with it once you get it. In order to properly work with the end of a file, you need to test if the character is valid before doing anything with it.

Text File I/O

Text file I/O functions build on binary data I/O functions to view files as collections of text objects: characters, strings, and lines. These functions reflect these text objects.

The simplest function that reads text objects from a file is fgetc(), whose prototype is below:

int fgetc(FILE *fd);

This function reads a character from the file described by the file descriptor fd. The function returns the character read, or in case of an error, it returns EOF.

Alongside fgetc() is fgets(), which gets a string of a specified length from a file. The prototype for fgets() is below:

char *fgets(char *buffer, int bufsize, FILE *fd);

This function reads up to bufsize-1 characters from the file described by fd. It copies this string into the buffer buffer, and appends a NULL character to terminate the string.

Let's consider an example. Let's say we want to print all the lines of a file to the screen. We could use code like this:

int c;
FILE *fd;

fd = fopen("example.txt", "r");

if (fd) {
    while ((c = fgetc(fd)) != EOF)
        putchar(c);
    fclose(fd);
}

This example demonstrates several things. As we saw before, the fopen() function opens the file requested, returning either the file descriptor or a NULL (0). Here, we put the assignment in the "while" loop, testing for the end of file. Note the end of the file is denoted with a special character that we can test for. We use the function putchar() to print the character to the screen. Finally, fclose() closes out the file when we are done.

Is This Bad Code?

We have stated that using an assignment statement in another statement is very bad form. However, this way of processing data from files just fits: any other way of processing the character would be unwieldy and more unreadable. It is probably the only situation where an assignment statement could be used in a while conditional.

The above code is nice because it does not require us to use an array or dynamic memory to store a line of text. However, we could try something different and process the file using strings and an allocated buffer like this:

#define CHUNK 256
char buf[CHUNK];
FILE *fd;

fd = fopen("example.txt", "r");
if (fd) {
    while (fgets(buf, sizeof(buf), fd) != 0)
        fputs(buf, stdout);
    fclose(fd);
}

Here we process the file in a series of strings. Each successive value of buf contains one line from the text file, including the terminating line feed character, ended with a NULL terminator. We stop the loop when the fgets() call indicates we have read no characters (note that we are not looking for the EOF character here). There are some issues with this approach, the biggest one being the waste of space in the large array declared for the buffer.

Note that we could have just processed the file in chunks of text like this:

#define CHUNK 256
char buf[CHUNK];
FILE *fd;
size_t nread;

fd = fopen("example.txt", "r");
if (fd) {
    while ((nread = fread(buf, 1, sizeof buf, file)) > 0)
        fwrite(buf, 1, nread, stdout);
    fclose(file);
}

This code uses fread() that we saw in the previous section and it points out that even text files can be processed as raw files of byte data. Note that this use of fread() ends when the number of characters read is zero or negative, the latter case indicating an error. Note, too, the use of the file descriptor stdout. This indicates the screen, which we will review in the next section.

Writing text files works in analogous ways, using analogous functions. Writing characters to a file can be done with the fputc() function, whose prototype is below:

int fputc(int c, FILE *fd);

This writes a single character, as an unsigned char or byte, to the file indicated by the file descriptor fd.

The function fputs() writes a string to a file, without the string's NULL terminator. Its prototype is below:

int fputs(const char *s, FILE *fd);

This function writes the sequence of characters indicated by the pointer to the file.

Let's take another example. We can write code to copy the contents of one to another in a number of ways. First, we could do it character-by-character:

int c;
FILE *srcfd;
FILE *destfd;

srcfd = fopen("example.txt", "r");
destfd = fopen("example2.txt", "w");

if (srcfd && destfd) {
    while ((c = fgetc(srcfd)) != EOF)
        fputc(c, destfd);
    fclose(srcfd);
    fclose(destfd);
}

This code reads characters from a source file, writing them to a destination file.

We could do this by strings in a similar way:

#define CHUNK 256
char buf[CHUNK];
FILE *srcfd;
FILE *destfd;

srcfd = fopen("example.txt", "r");
destfd = fopen("example2.txt", "w");

if (srcfd && destfd) {
    while (fgets(buf, sizeof(buf), srcfd) != 0) 
        fputs(buf, destfd);
    fclose(srcfd);
    fclose(destfd);
}

File Constants and Variables

The "stdio" system defines several constants and variables that can be used by the functions in the I/O system.

There are three predefined file descriptors, describing input and output features in a fixed manner:

stdin is a file descriptor, describing a "default" input source of data. In most computer systems, this usually means the computer's keyboard.
stdout is a file descriptor that describes the "default" destination for output of data. For most computer systems, this usually means the computer's screen.
stderr is a file descriptor that depicts the "default" destination for error messages and other diagnostic output. In most system, this also usually means the computer screen.

Using these definitions, we can expand our selection of file I/O functions. For example, the function described by this prototype:

char *gets(char *s);

is equivalent to fgets(s, stdin). Likewise, there is an output function:

int puts(const char *s);

which is equivalent to fputs(s, stdout). In addition, there are functions:

int getchar(void);
int puts(const char *s);

which are the same as fgetc(stdin) and fputc(s, stdout).

Predefined File Descriptors Reflect Unix

The use of these predefined file descriptors reflects C's beginnings on Unix. Unix, and now Linux, treats device I/O in file I/O terms: sending data to a device uses file descriptors for the device and file I/O functions for manipulating the device. It makes sense, then, that sending data to the computer screen or getting data from the keyboard would involve writing and reading the "files" that represented the screen and keyboard devices.

The use of predefined file descriptors like "stdin" and "stdout" have stayed with C implementations, even on non-Unix operating systems.

We have already mentioned another predefined constant: the value that defines the end of a file. The "EOF" constant represents a value that ca be used to detect the end of a file. When a character has the "EOF" value, the end of a file has been reached.

The Bottom Level

We have discussed file-oriented function calls that implement file I/O for C programs. It is useful to briefly mention that these functions are not at the bottom of the I/O chain; there is one more set of more general functions on which file I/O functions are based.

The read() and write() functions available to C represent the most general functions used to read and write to various devices in a computer. These functions are used generally for interfacing C programs with devices in a computer system.

The prototype for the read() function is below:

ssize_t read(int fd, void *buf, size_t count);

Note that, like file I/O functions, it uses a file descriptor (fd). This reflect the Unix model of interfacing with devices as if they were files. The buffer to read data into is the second parameter (buf) and the size of this buffer (count) is given as the third parameter. The function returns the number of bytes read.

The prototype for the write() function is below:

ssize_t write(int fd, const void *buf, size_t count);

Again, this function uses a file descriptor (fd), reflecting the Unix device model. The buffer that contains data to write (buf) and the size of this buffer (count) are also given as parameters.

These functions are used for many device I/O applications, including file I/O. Using the higher-level functions allows programmers to focus on using files rather than devices.

Functions That are Useful for Pebble: printf and sprintf

Before we conclude our look at C file I/O, we should describe some file I/O functions that actually have an application on a Pebble smartwatch. The printf() function and its variants can be used on a Pebble smartwatch and have some very flexible ways to generate output.

Let's look at printf() first; its variants work very much this first function. printf() has the following prototype:

 int printf(const char *format, ...);

This function produces output to stdout in the format specified by the string format, given as the first parameter. The format string contains characters to write to the output combined with zero or more directives. Directives are placeholders where the values of variables will be inserted in the format string. These directives specify the way to construct the output.

Let's illustrate with an example. Let's say we are writing a table of data, where each row contains a country name as a string, a floating point percentage, and an integer number. We might use a printf() function like the one below to print a row:

printf("%s | %f%% | %d", country_name, percent, population);

The format of the output line is given as a string which contains characters to be output and directives. Each directive begins with a "%" character. In the above example, there are three directives: a string (%s), a floating point number (%f) and an integer (%d). The format string is followed by three parameters, whose values are inserted into the string as the directive describes. country_name will be inserted into a string field; percent will be fit into the floating point field; and population will be fit into an integer field. Note the %% sequence; this is the way to represent a "%" character.

However, for a neat table output, the above printf() might not be enough. The columns will probably not be vertically straight. For this example, the exact number of characters for a string will be inserted. For country names of varying length, this will not result in straight columns. If we can fix the width of fields, we can have a nicer output.

We can fix field with in our example like this:

printf("%25s | %5.2f%% | %15d", country_name, percent, population);

Here, we put the field width after the percent sign but before the specifier. Here, the country name will be right-justified and left-padded by spaces in a field that is 25 characters wide; the percentage will be output in a field 5 characters wide with 2 numbers right of the decimal point, and the population will be output in a field that is 15 character wide.

The printf() set of functions are extremely flexible for a number of reasons.

The number of directives, and therefore the number of parameters that follow the format string, are variable.
The properties of the output, including field width, precision, and value type, can be specified in the format string.
The printf() function set has a number of implementations, including those that write data to files and those that format strings.

It is this last point that make these functions useful for those writing applications for Pebble smartwatches. The printf() function set has a number functions, including:

fprintf(), a function that works like the functions ourlined in this chapter, writing data to files.
sprintf(), a function that creates a string using the specified format and variables.
snprintf(), a function that works like a safer version of sprintf(), taking a total string length specifier.

Let's conclude with an example of this last function, which is included in the I/O functions implemented by the Pebble SDK. For Project Exercise 17.1, we did some manipulating of madlibs. We inserted random words into a sentence. We did not use printf(), but we could have done something like this:

sprintf(sentence, "We %s very fast and %s our %s out", verb1, verb2, noun);

Unfortunately, we don't know if the final string with the directives filled in will be able to completely fit into the variable sentence. If the final string extends beyond the bounds of the sentence variable, we have a memory leak and it could damage the values of other variables or cause of program crash. The safer way to do this is to specify the length of sentence and not allow the constructed result to grow beyond that length. So snprintf() should be used as follows:

snprintf(sentence, 80, "We %s very fast and %s our %s out", verb1, verb2, noun);

For a complete specification of printf() functions, include a detailed description of format strings, see the Wikipedia entry on these functions at this link.

Chapter 16: Standard C File I/O

Chapter 16: Standard C File I/O

A Little Perspective

A General Overview

Opening and Closing a File

Direct, Non-specific File I/O

Text File I/O

File Constants and Variables

The Bottom Level

Functions That are Useful for Pebble: printf and sprintf

results matching ""

No results matching ""