《The C Language, 1989 ANSI Edition》Summary

2023, Jun 30    

📖 The C Language, 1989 ANSI Edition

This is the only book I kept in my bookshelf which is printed before I was born. Such a tidy but profounding great work! I have combined some of the chapters into one section such as the intro and the operators since they are highly related to make the content even more consize. 🦄️

1. Introduction

1.1 Syntax Basics

Provide essential elements in the language, such as function declaraion, program entrace, basic IO usages, types, and operators.

#include <stdio.h>

main()
{
    printf("hello, world\n");
}

String formatters supports the “%o” for octal, “%x” for hexademical, “%c” for character , “%s” for string and “%%” for it self.

float fahr = 8.0;
float celsius = 20;0

// print the float, for integer part for at least 3/6 character wide
printf("%3.0f %6.1f\n", fahr, celsius);

Supports the most traditional way of For statement:

main()
{
    for(int fahr = 0; fahr <= 300; fahr = fahr = 1) {
        printf("%d\n", fahr);
    }
}

Basic I/O, simple enough, use getChar and putChar. It is the responsibility of the developer to format the input and decide where the output should be delivered to:

main()
{
	int c;

    // count the lines of input as well
	int nl = 0;

    // default char can be mapped to int
    c = getChar();
	while(c != EOF) {
        putchar(c);
        c = getchar();

        if (c == '\n') {
            printf("new line: %d", ++nl);
        }
    }
}

Function should always satisfy the form: “return-type function-name(parameter declarations, if any”

int base = 2;
int n = 3;

int power(int base, int n)
{
    int result = 1;
    for (int i = 0; i <= n; i++) {
        p = p * base;
    }
}

power(base, n);

Function params are passed by values, no copy is conducted. (This is indeed very different from modern programming languages such as swift). So means if you put into an array, then you can modify the content of the array inside a function. (no mutability protection).

// as long as s[] is initialized, can be passed in and
// reused multiple times
int getline(char s[], int lim)
{
    int c, i;
    for (i = 0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; +=i) {
        s[i] = c;
        if (c == '\n') {
            s[i] = c;
            ++i
        }
        s[i] == '\0';
        return i;
    }
}

Variables can be decalred either in function or outside of function, namely internal variable and external variable. For those files where function and external variables are declared together, those functions no need to use the externkeyword to clarify those external variable. For functions want to import external variables declared in another file, the externkeyword is mandataory.

1.2 Naming Convensions

  • camel case
  • lowercase for varaible
  • uppercase for constants

1.3 Primitive Types & Operators

Types & Constants

image.png C only supports a few basic types (reference C Data-Types):

  • char: a single byte, holding 255 char set.
  • int: typically reflecting the natural size of integer sizes. there are also existence of short/long/long long for supporting the need of different range of integers.
  • float: single-precision floating point.
  • double: double-precision floating point.
  • void: means no type, or no thing.

Constants are declared via #define.

Operators

Further Reading: operator precedence and associativity in c Different types of operators:

  • arithmetic: +, -, \*, /
  • relational: >, >=, <, <=
  • increment/decrement: ++, --
  • bitwise: &, |, ^, <<, >>, ~

Conversions

Mainly there are three types of conversions:

  • Implicit conversions: when conducting operatos on two types of different data types. The narrower type are casted to the wider types.
  • explict conversions: aka typecasting.
  • using the system provided functions such as atoi, itoa.

1.4 Control Flows

As the pioneer of modern programming languages, C provides classic control flows:

  • If/Else/Else-If
  • For/While/Do-While
  • Switch
  • Goto (labels)

2. Pointers, Arrays, Structures

2.1 Pointers

Basics

Pointers is the soul in C and its family languages, and the concept itself is the reason everytime why I open this C book. A pinter is a group of cells that can hold an address. The relation of pointer and the value it points to can be elaborated with the graph below: image.png

The unary operator & gives the address of an object. Hence for a pointer ptr and the value it points to var, can be expressed as:

ptr = &var;

Suppose that x and y are integers and ip is a pointer to int. This artificial sequence shows how to declare a pointer and how to use & and *:

int x = 1;
int y = 2;
int z[10];
int *ip;

ip = &x;  			// ip now points to x
y = *ip;  			// now assign y with the value of x: same as y = x
*ip = 0;  			// now dereference x and write 0 to x: same as x = 0
*ip = *ip + 10; 	// equals to x = x + 10
(*ip)++;			// equals to x++

Since in C language (actually C family languages such as ObjC as well), function arguments are passed by value, which means they are copied. Hence for primitive types which we mentioned in early part of this document such as int/float/char/double when they are passed in function, there is no direct way inside the function body to alter them. This situation can be mitigated via passing in the pointer types instead:

void swap(int *px, int *py) {
    int temp;

    temp = *px;
    *px = *py;
    *py = temp;
}

Nature of Array

In C, strong relationship between pointers and arrays exists, Strong enough that the two should be discussed simultaneously: “Any operation that can be achieved by array subscripting can also be done with pointers” image.png Assume a stands for an integer array, pa is the ponter of the array (to the first element of the array). The following two expression is identical:

&a[2] // address of third element of the array
pa+2  // address of third element of the array

Function as Pointer

In C, a function itself is not a variable, but it is possible to define pointers to functions. Further it can be assigned, placed in arrays, passed to functions and returned by functions as well. An example of this can be elaborated in the following function signature:

// where the last argument is called: comp, which returns
// int and take the parameter of random type pointer.
void qsort(void *lineptr[], int left, int right, int (* comp)(void *, void *));

Taking one step further to see how function as argument is implemented with details from the above function:

void _qsort(void *v[], int left, int right, int(*comp)(void *, void *)) {

    int i, last;
    void swap(void *v[], int, int);

    if (left >= right)
        return;
    swap(v, left, (left + right)/2);
    last = left;
    for (i = left+1; i <= right; i++)
        if ((*comp)(v[i], v[left]) < 0)
            swap(v, ++last, i);
    swap(v, left, last);

    _qsort(v, left, right-1, comp);
    _qsort(v, last+1, right, comp);
}

Variable-Length Argument Lists

The printf function provides a vivid demo to show how in C or many other morden languages where a function can process a variable-length argument list in a portable way.

int sum(int num_args, ...) {
	int val = 0;
	va_list ap;
	int i;

	// initializes ap variable to be used with the va_arg and va_end macros
	va_start(ap, num_args);
	for(i = 0; i < num_args; i++) {
		// move the ap to the next argument and return the value
		val += va_arg(ap, int);
	}
	// conducts cleaning
	va_end(ap);

	return val;
}

Complicated Declarations

C is castigated for the syntax of its decalrations, particularly ones that involve pointers to functions. The syntax is an attempt to make the declaration and the use agree. Use the following practise to familiarize type declarations:

// argv: pointer to pointer to char
char **argv;

// daytab: pointer to array[13] of int
int (*daytab)[13];

// daytab: array[13] of pointer to int
int *daytab[13];

// comp: function return pointer of void
void *comp;

// comp: pointer of function which return void
void (*comp)();

// x: function returning pointer of an array of
// function pointer each returns a character
char (*(*x())[])();

// array[3] of pointers to functions returning ponters of array[5] of char
char (*(*x[3])())[5];

2.2 Structures

Basics

A structure is a collection of one or more variables, possibly of different types, grouped together under a single name for convenient handling.

// simple structure example
struct point {
    int x;
    int y;
}

// initialize the above structure
struct point maxPoint = { 320, 200 };

// structure also support composition
struct rect {
    struct point pt1;
    struct point pt2;
}

struct point pt1 = { 1, 1 };
struct point pt2 = { 2, 2 };
struct rect screen = { pt1, pt2 };

printf("print screen pt x: %d", screen.pt1.x);

Structures & Functions

When writing functions to handle structure, there are generally three ways to do it:

  • pass components separately.
  • pass entire structure via copy.
  • pass a pointer to it.

If a large structure is to be passed to a function, it is generally more efficient to pass a pointer than to copy the structure or pass the components of the structure separately. In this case, the pointer of a structure instance for the pointabove can be expressed as:

struct point *pp;

// access the structure member
pp->x;

// or
(*pp).x;

The structure operators . and - together with () for function calls and [] for subscripts, are at the top opf the precedence hierarchy of C syntax declarations.


struct {
    int len;
    char *str;
} *p;

// increment the len under p
++p->len;

Structrues & Arrays

Here is an example of how arrays of structures are declared:


struct key {
    char *word;
    int count;
} keytab[NKEYS];

The above example declares structure type key, and the array of structures of this type. In the meanwhile, a storage of NKEYS have been set aside. Alternatively, it can also be written as:


struct key {
    const *word;
    int count;
};

struct key keytab[NKEYS];

Pointers to Structures

Let’s now combine the concepts mentioned above, and see how pointers of structures can be used in daily developments. In recursive structure, it is illegal to declare a structure which contain the same structure in itself. Instead pointer of the strucure should be used.


struct tnode {
    char *word;
    int count;
    struct tnode *left;
    struct tnode *right;
};

struct tnode *talloc(void) {
    return (struct tnode *) malloc(sizeof(struct tnode));
}

struct tnode *address(struct tnode*p, char *w) {
    int cond;
    if (p == NULL) {
        p = talloc();
        p->word = strdup(w);
        p->count = 1;
        p->left = NULL;
        p->right = NULL;
    } else if ((cond = strcmp(w, p->word)) == 0) {
        p->count++;
    } else if (cond < 0) {
        p->left = address(p->left, w);
    } else {
        p->right = address(p->right, w);
    }
    return p;
}

void treeprint(struct tnode* p) {
    if (p != NULL) {
        treeprint(p->left);
        printf("%4d %s\n", p->count, p->word);
        treeprint(p->right);
    }
}

As shown in the above example, in C language, the mallocis used to return a pointer to void, which can be used to allocate memories. Then this void pointer are explicit casted to the specified types. After we allocate the memory space for structure, and try to access members in it. The memory address of each member is contigious inside the whole memory space of the structure. The first member of the structure share the starting address of the structure instance.


/*
 address of node: 0x600002d98f20
 address of node->word: 0x600002d98f20
 address of node->count: 0x600002d98f28
 address of node->left: 0x600002d98f30
 address of node->right: 0x600002d98f38
 */

struct tnode *node = malloc(sizeof(struct tnode));

printf("address of node			: %p\n", node);
printf("address of node->word	: %p\n", &(node->word));
printf("address of node->count	: %p\n", &(node->count));
printf("address of node->left	: %p\n", &(node->left));
printf("address of node->right	: %p\n", &(node->right));

As compare with malloc, calloc can also be used, the differnet between malloc and calloc is that calloc not just allocate memory space, it also sets the allocated memory to zero. (refer to calloc in c). For more details, feel free to read: Memory Allocation for Structure

Unions

A union is a variable that may hold (at different times) objects of different types and sizes, with the compiler keeping tracking of size and alignment requirements. Unions provide a way to manipulate different kinds of data in a single area of storage - a single varaible that can be legitimately hold any one of several member types. For example:

union u_tag {
    int ival;
    float fval;
    char *sval;
} u;

It usually works together with structures since the later one can provide extra information to indicate which type in union is currently in use:

struct {
    char *name;
    int flags;
    int utype;
    union {
        int ival;
        float fval;
        char *sval;
    } u;
} symtab[NSYM];

#define INT 0
#define FLOAT 1
#define STRING 2

if (symtab[i].utype == INT) {
    printf("%d\n", symtab[i].u.ival);
} else if (symtab[i].utype == FLOAT) {
    printf("%f\n", symtab[i].u.fval);
} else if (symtab[i].utype == STRING) {
    printf("%s\n", symtab[i].u.sval);
} else {
    printf("bad type: %d", symtab[i].utype);
}

Bit-fields

Talked about another common techniques used not just in C itself, but also widely used in many other languages:

// definition of flags
#define KEYWORD 	01
#define EXTERNAL 	02
#define STATIC 		04

// turns on  `static` & `external` in flags
int flags |= EXTERNAL | STATIC;

// turns off `static` & `external` in flags
flags &= ~(EXTERNAL | STATIC);

// which can be further used in `if` clause
if ((flags & (EXTERNAL | STATIC)) == 0) {
    // ...
}

3. I/O

3.1 Basics

Standard

Use getChar()/putChar(int) to read and print single character from/to the standard I/O.

Formatted I/O

Since the getChar/putChar only processes single characters, to further format data and print them as a whole character array (String). The int printf(char *format, arg1, arg2, ....)is used, which returns an integer for indicating the count of chars printed. Correspondingly, the int scanf(char *format, ...)provides the capability to read characters from the standard input, interprets them according to the specification in format.

3.2 File Access

The File is also C structs where the implementation varies on different platforms. C can handle files as stream oriented data(text) files and system oriented data(binary) files. In summary, there are file significant operations can be performed on files:

  • Creation/Opening of new file: fopen(), with modes “r/w/a/r+/w+/a+”, where “w+/a+” should be the most frequent to use, which supports open or create the file if not exists, for both read and write operation. The “a+” will append new content to the end of file, where the “w+” will override the existing content.
  • Reading data from a file:
    • “getw” is used to get a number from the file.
    • “getc” is used to get the next character from the specified stream and advances the position indicator for the stream.
    • “fgets” is used to get strings for 1 line each time.
    • “fscanf” is used to extra arguments based on the formatted string into variables.
  • Writing data in a file
    • “putw” is used to put a number into the file.
    • “putc” is used to write a character into a file.
    • “puts” is used to write a string into a file, one line per time.
    • “fprintf” is used to write a formatted string into a file.
  • Closing a file: call int fclose( FILE * stream ) which closes the pointer of file stream and return 0 if the action is successful, otherwise, return the EOF constants.

// (char[10]) str1 = "Welcome\0\0"
// (char[10]) str2 = "to\0\0\0\0\0\0\0"
// yr = 0 (no values can be assigned)

void experimentFileIO(void) {
    char str1[10], str2[10];
    int yr;
    FILE* fileName;
    fileName = fopen("anything.txt", "w+");
    fputs("Welcome to", fileName);
    rewind(fileName);
    fscanf(fileName, "%s %s %d", str1, str2, &yr);
    printf("----------------------------------------------- \n");
    printf("1st word %s \t", str1);
    printf("2nd word  %s \t", str2);
    printf("Year-Name  %d \t", yr);
    fclose(fileName);
}

4. UNIX Interface

The UNIX operating system provides its services through a set of system calls, which are in fact functions within the operating system that may be called by user programs. This chapter mainly talks about three major parts: input/output, file system, and storage allocations.

4.1 Input/Output

  • File Descriptor:
    • When a program is executed via shell by default, three file descriptors are provided by default: 0 - standard input, 1 - standard output, 2 - standard error.
    • When initiate the program via shell, user could also redirect the I/O to and from file via < and >: prog <infile >outfie
  • File Descriptor I/O:
    • Use the following function for read and write:
// indicate the number of charaters `n` to be read from the fd
// into the char buffer `buf`, return the number of char read.
// if `EOF` is met, the return int may be smaller than `n`.
int read(int fd, char *buf, int n);

// indicate the number of characters `n` to be write into the fd
// from the char buffer `buf`, return the number of char written.
// should be same as `n`
int write(int fd, char *buf, int n);

4.2 File

Other than the default standard input, output and error, in the situation of accessing the file in an adhoc manner during the execution of a program: Open, Creat, Close, Unlink.

  • open: rather like fopen, but which instead returns a fd
// name: 	the name of file to open
// flags: 	O_RONLY, O_WRONLY, O_RDWR
// perms: 	0 in this case
// return: 	code indicate the operation result
int open(char *name, int flags, int perms);
  • creat: pay attention to the perms definition, which is widely used in daily unix shell commands
// name: 	the name of file to creat
// perms:  	three int digits, x|x|x, for owner, for owner group, for all users
// 			each x has three bits, for read/write/execute correspondingly
// return:	code indicate the operation result
int creat(char *name, int perms);
  • close: due to there is a limit of number of file descriptors a program can access in the system. Hence before reaching the system limit, please close the current fd. (how many files I can open per shell)
// the fd to close
close(int fd);
  • unlink: removes the file from the file system.
// the file to delete
void unlink(char *name);

4.3 Storage Allocation

In C, malloc maintains a list of free memory blocks which are not contiguious on memory address. Each of them containing a size and a pointer to the next block. image.png When a request for memory allocation is called, the free list if scanned until a big-enough block is found (Best Fit). If the block is bigger, a portion of the block is splitted and retained for use, and the remaining is kept as a free memory block. Each free block has a header structure indicating the information for this block:

typedef long Align; /* for alignment to long boundary*/

union header {		/* block header */
    struct {
        union header *ptr;	/* next block if on free list */
        unsigned size;		/* size of this block */
    } s;
    Align x; 		/* force alignment of blocks */
}

typedef union header Header;

A version of memory allocation function is shown below:

static Header base; 				/* empty list to get started */
static Header *freep = NULL;		/* start of free list */

/* malloc: general-purpose storage allocator */
void *malloc(unsigned nbytes)
{
    Header *p, *prevp;
    Header *morecore(unsigned);
    unsigned nunits;

    // allocate number of unites of header to be occupied by the input
    nuints = (nbytes+sizeof(Header)-1)/sizeof(Header) + 1;

    // if not allocate base, alloc for it
    // initialise base
    if ((prep = freep) == NULL) {
        base.s.ptr = freep = prevp = &base;
        base.s.size = 0;
    }

    // iterate through list and try allocate
    for (p = prevp->s.ptr; ; prevp = p, p = p->s.ptr) {
        // compare size normally
		if (p->s.size >= nunits) {
        	if (p->s.size == nunits) {
                prevp-s.ptr = p->s.ptr
            } else {
                p->s.size -= nunits;
                p += p->s.size;
                p->s.size = nuints;
            }
            freep = prevp;
            return (void *)(p+1);
		}

        // edge case (refer to line 17)
        if (p == freep) {
            // in case not NULL, means p is allocated successfully
            // which executes `prevp = p, p = p->s.ptr` and
            // continue for the next round ~
            if ((p = morecore(nunits)) == NULL {
                return NULL;
            }
        }
    }
}

5. References

TOC