1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

Wednesday, April 22, 2009

What is Address Resolution

Given a host name, we want to find it's IP address. We do that using the function gethostbyname(). This function is defined in the file /usr/include/netdb.h (or the equivalent for your system) as follows:

struct hostent *gethostbyname(char *hostname);

The input to the function will be the name of the host whose address we want to resolve. The function returns a pointer to a structure hostent, whose definition is as follows:


struct hostent {
char* h_name; /* official name of host */
char** h_aliases; /* alias list */
int h_addrtype; /* host address type */
int h_length; /* length of address */
char** h_addr_list; /* list of addresses from name server */
#define h_addr h_addr_list[0] /* address, for backward compatibility */
};

Lets see what each field in the hostent structure means:

* h_name: This is the official name of the host, i.e. the full address.
* h_aliases: a pointer to the list of aliases (other names) the host might have.
* h_addrtype: The type of address this host uses.
* h_length: The length of the address. Different address types might have different lengths.
* h_addr_list: A pointer to the list of addresses of the host. Note that a host might have more than one address, as explained earlier.
* h_addr: In older systems, there was only the h_addr field, so it is defined here so old programs could compile without change on newer systems.

How to do Closing a socket.

When we want to abort a connection, or to close a socket that is no longer needed, we can use the close() system call. it is defined simply as:

int close(int socket);

* socket - The socket that we wish to close. If it is associated with an open connection, the connection will be closed.

How to do Sending and receiving data over a socket

After a connection is established (We will explain that when talking about Client and Server writing), There are several ways to send information over the socket. We will only describe one method for reading and one for writing. The others will be mentioned only in the "See Also" section.
The read() system call

The most common way of reading data from a socket is using the read() system call, which is defined like this:

int read(int socket, char *buffer, int buflen);

* socket - The socket from which we want to read.
* buffer - The buffer into which the system will write the data bytes.
* buflen - Size of the buffer, in bytes (actually, how much data we want to read).

The read system call returns one of the following values:

* 0 - The connection was closed by the remote host.
* -1 - The read system call was interrupted, or failed for some reason.
* n - The read system call put 'n' bytes into the buffer we supplied it with.

Note that read() might read less than the number of bytes we requested, due to unavailability of buffer space in the system.
The write() system call

The most common way of writing data to a socket is using the write() system call, which is defined like this:

int write(int socket, char *buffer, int buflen);

* socket - The socket into which we want to write.
* buffer - The buffer from which the system will read the data bytes.
* buflen - Size of the buffer, in bytes (actually, how much data we want to write).

The write system call returns one of the following values:

* 0 - The connection was closed by the remote host.
* -1 - The write system call was interrupted, or failed for some reason.
* n - The write system call wrote 'n' bytes into the socket.

Note that the system keeps internal buffers, and the write system call write data to those buffers, not necessarily directly to the network. thus, a successful write() doesn't mean the data arrived at the other end, or was even sent onto the network. Also, it could be that only some of the bytes were written, and not the actual number we requested. It is up to us to try to send the data again later on, when it's possible, and we'll show several methods for doing just that.

Associating a socket with a connection

After a socket is created, it still needs to be told between which two end points it will communicate. It needs to be bound to a connection. There are two steps to this binding. The first is binding the socket to a local address. The second is binding it to a remote (foreign) address.

Binding to a local address could be done either explicitly, using the bind() system call, or implicitly, when a connecting is established. Binding to the remote address is done only when a connection is established. To bind a socket to a local address, we use the bind() system call, which is defined as follows:


int bind(int socket, struct sockaddr *address, int addrlen);


Note the usage of a different type of structure, namely struct sockaddr, than the one we used earlier (struct sockaddr_in). Why is the sudden change? This is due to the generality of the socket interface: sockets could be used as endpoints for connections using different types of address families. Each address family needs different information, so they use different structures to form their addresses. Therefore, a generic socket address type, struct sockaddr, is defined in the system, and for each address family, a different variation of this structure is used. For those who know, this means that struct sockaddr_in, for example, is an overlay of struct sockaddr (i.e. it uses the same memory space, just divides it differently into fields).

There are 4 possible variations of address binding that might be used when binding a socket in the Internet address family.

The first is binding the socket to a specific address, i.e. a specific IP number and a specific port. This is done when we know exactly where we want to receive messages. Actually this form is not used in simple servers, since usually these servers wish to accept connections to the machine, no matter which IP interface it came from.

The second form is binding the socket to a specific IP number, but letting the system choose an unused port number. This could be done when we don't need to use a well-known port.

The third form is binding the socket to a wild-card address called INADDR_ANY (by assigning it to the sockaddr_in variable), and to a specific port number. This is used in servers that are supposed to accept packets sent to this port on the local host, regardless of through which physical network interface the packet has arrived (remember that a host might have more than one IP address).

The last form is letting the system bind the socket to any local IP address and to pick a port number by itself. This is done by not using the bind() system call on the socket. The system will make the local bind when a connection through the socket is established, i.e. along with the remote address binding. This form of binding is usually used by clients, which care only about the remote address (where they connect to) and don't need any specific local port or local IP address. However, there are exceptions here too.

How to Creat sockets

Creation of sockets is done using the socket() system call. This system call is defined as follows:


int socket(int address_family, int socket_type, int proto_family);


address_family defines the type of addresses we want this socket to use, and therefor defines what kind of network protocol the socket will use. We will concentrate on the Internet address family, cause we want to write Internet applications.

socket_type could be one of the socket types we mentioned earlier, or any other socket type that exists on your system. We choose the socket type according to the kind of interaction (and type or protocol) we want to use.

proto_family selects which protocol we want to socket to use. We will usually leave this value as 0 (or the constant PF_UNSPEC on some systems), and let the system choose the most suitable protocol for us. As for the protocol itself, In the Internet address family, a socket type of SOCK_STREAM will cause the protocol type to be set to TCP. A socket type of SOCK_DGRAM (Datagram socket) will cause the protocol type to be set to UDP.

The socket system call returns a file descriptor which will be used to reference the socket in later requests by the application program. If the call fails, however (due to lack of resources) the value returned will be negative (note that file descriptors have to be non-negative integers).

As an example, suppose that we want to write a TCP application. This application needs at least one socket in order to communicate across the Internet, so it will contain a call such as this:


int s; /* descriptor of socket */

/* Internet address family, Stream socket */
s = socket(AF_INET, SOCK_STREAM, 0);
if (s < 0) {
perror("socket: allocation failed");
}

Types of sockets

In general, 3 types of sockets exist on most Unix systems: Stream sockets, Datagram sockets and Raw sockets.

Stream sockets are used for stream connections, i.e. connections that exist for a long duration. TCP connections use stream sockets.

Datagram sockets are used for short-term connections, that transfer a single packet across the network before terminating. the UDP protocol uses such sockets, due to its connection-less nature.

Raw sockets are used to access low-level protocols directly, bypassing the higher protocols. They are the means for a programmer to use the IP protocol, or the physical layer of the network, directly. Raw sockets can therefor be used to implement new protocols on top of the low-level protocols. Naturally, they are out of our scope.

What is a socket?

A socket is formally defined as an endpoint for communication between an application program, and the underlying network protocols. This odd collection of words simply means that the program reads information from a socket in order to read from the network, writes information to it in order to write to the network, and sets sockets options in order to control protocol options. From the programmer's point of view, the socket is identical to the network. Just like a file descriptor is the endpoint of disk operations

What is ICMP?

ICMP is Internet Control Message Protocol, a network layer protocol of the TCP/IP suite used by hosts and gateways to send notification of datagram problems back to the sender. It uses the echo test / reply to test whether a destination is reachable and responding. It also handles both control and error messages.

What is Bandwidth?

Every line has an upper limit and a lower limit on the frequency of signals it can carry. This limited range is called the bandwidth.

What is mesh network?

A network in which there are multiple network links between computers to provide multiple paths for data to travel.

What are the different type of networking / internet working devices?

Repeater:
Also called a regenerator, it is an electronic device that operates only at physical layer. It receives the signal in the network before it becomes weak, regenerates the original bit pattern and puts the refreshed copy back in to the link.
Bridges:
These operate both in the physical and data link layers of LANs of same type. They divide a larger network in to smaller segments. They contain logic that allow them to keep the traffic for each segment separate and thus are repeaters that relay a frame only the side of the segment containing the intended recipent and control congestion.
Routers:
They relay packets among multiple interconnected networks (i.e. LANs of different type). They operate in the physical, data link and network layers. They contain software that enable them to determine which of the several possible paths is the best for a particular transmission.
Gateways:
They relay packets among networks that have different protocols (e.g. between a LAN and a WAN). They accept a packet formatted for one protocol and convert it to a packet formatted for another protocol before forwarding it. They operate in all seven layers of the OSI model.

How Gateway is different from Routers?

A gateway operates at the upper levels of the OSI model and translates information between two completely different network architectures or data formats.
Router is device that work at third level of OSI model
gateway work at presentation layer of OSI model

What is frame relay, in which layer it comes?

Frame relay is a packet switching technology. It will operate in the data link layer.

What is Beaconing?

The process that allows a network to self-repair networks problems. The stations on the network notify the other stations on the ring when they are not receiving the transmissions. Beaconing is used in Token ring and FDDI networks.

What are the important topologies for networks?

BUS topology:
In this each computer is directly connected to primary network cable in a single line.
Advantages:
Inexpensive, easy to install, simple to understand, easy to extend.
STAR topology:
In this all computers are connected using a central hub.
Advantages:
Can be inexpensive, easy to install and reconfigure and easy to trouble shoot physical problems.
RING topology:
In this all computers are connected in loop.
Advantages:
All computers have equal access to network media, installation can be simple, and signal does not degrade as much as in other topologies because each computer regenerates it.

What is subnetting? why is it used?

A portion of a network that shares a common address component. On TCP/IP networks, subnets are defined as all devices whose IP Address have the same prefix. For example, all devices with IP addresses that start with 100.100.100. would be part of the same subnet. Dividing a network into subnets is useful for both security and performance reasons. IP networks are divided using a subnet mask

What is the Network Time Protocol?

The Network Time Protocol (NTP) is a protocol for synchronising the clocks of computer systems over packet-switched, variable-latency data networks. NTP uses UDP as its transport layer. It is designed particularly to resist the effects of variable latency.

What is RAID?

A method for providing fault tolerance by using multiple hard disk drives.

What is the default subnet mask for an ipv6 address ?

A default subnet mask 255.0.0.0 for Class A, 255.255.0.0 for class B, 255.255.255.0 for Class C.

What is fragmentation of a packet ?

dividing a packet into mutiple frames to be processed in the data link layer is fragmentation of packets.
Frames

Packet data unit(PDU)is called

----------------

segment at transport layer

packet at a network layer

frame at data link and

bit/bytes at physical layer

What is MTU of a link ?

Maximum Transmission Unit is the largest physical packet size, measured in bytes, that a network can transmit. Any messages larger than the MTU are divided into smaller packets before being sent.

Every network has a different MTU, which is set by the network administrator. On Windows 95, you can also set the MTU of your machine. This defines the maximum size of the packets sent from your computer onto the network. Ideally, you want the MTU to be the same as the smallest MTU of all the networks between your machine and a message's final destination. Otherwise, if your messages are larger than one of the intervening MTUs, they will get broken up (fragmented), which slows down transmission speeds.

Trial and error is the only sure way of finding the optimal MTU, but there are some guidelines that can help. For example, the MTU of many PPP connections is 576, so if you connect to the Internet via PPP, you might want to set your machine's MTU to 576 too. Most Ethernet networks, on the other hand, have an MTU of 1500, which is the default MTU setting for Windows 95.

Name any field of IP header that can prevent a packet to loop infinitely ?

TTL field of ip header.it fixes up a time to reach,if it s not reached within that stipulated time,it is being killed by the auto timer

Under what situations a packet can go into infinite loop in a network ?

if the following two conditions are simultaneously true:1. routing error leading to a loop2. the TTL field of the IP packet is not properly decremented at each hop

Describe a 3-way TCP/IP Handshake?

To establish a connection, TCP uses a 3-way handshake. Before a client attempts to connect with a server, the server must first bind to a port to open it up for connections: this is called a passive open. Once the passive open is established then a client may initiate an active open. To establish a connection, the 3-way (or 3-step) handshake occurs:

1. The active open is performed by sending a SYN to the server.
2. In response, the server replies with a SYN-ACK.
3. Finally the client sends an ACK back to the server.

At this point, both the client and server have received an acknowledgement of the connection.

you are given the IP address can u tell how many computers can be connected?What do you look at?

For Example in Class AIf IP address is 1.0.0.1 then Network bits is 8 bits and hosts bits are 24bits by calculating by formula 2n-2 (2power)Their fore for 1.0.0.1 the no computers can be connected 1,67,77,214

Which protocol is used for retrieving mails?

POP3 and IMAP4 are used to retrieve mails.
IMAP4 stores a copy of message on the server whereas POP3 does not.

What is piggy backing?

Piggybacking is gaining access to restricted communication channel by using session that another user has already established. Piggybacking can be defeated by logging off before leaving a workstation or terminal or by initiating a protected mode, such as via a screensaver, that requires re-authentication before access can be resWhich protocol is used for retrieving mails?

What is Ipsec tunneling, how it works?

IP tunneling (IP encapsulation) is a technique to encapsulate IP datagram within IP datagrams, which allows datagrams destined for one IP address to be wrapped and redirected to another IP address. IP encapsulation is now commonly used in Extranet, Mobile-IP, IP-Multicast, tunneled host or network.

What is the difference between TCP/IP host name and netBIOS host name?

TCP/IP host name is to recongize which system.Netbios Host Name is name of NIC Card details,

What are the minimum and maximum sizes of packets in an 802.3 Ethernet protocol?

minimum - 64bytes

maximum - approx1500 bytes (actually 1518 bytes)

What are the differences between a meshed and a star topology? Describe some of the advantages and disadvantages of each in a WAN context.

meshed - redundant paths, links to all sites in matrix from all sites in matrix, more complex, more fault tolerant, harder to configure

star - individual link from core to each site, response time better, straightforward, no redundancy, easy to configure

Monday, April 20, 2009

Makefile-Advanced Features

Advanced features

Make has some advanced features which we will discuss very briefly in this section. This includes specialized dependency rules, as well as writing Makefiles to work with your own suffixes, and writing new "built-in" rules.

Special dependencies

Usually, make uses the same command to create or update a target, regardless of which file changes. Some other files, such as libraries allow users to replace a portion of its code. For this kind of different behavior, make allows a special form of the dependency, where the action specified can differ, depending on which file has changed. Here is an example for this rule:

target :: source1
command1
target :: source2
command2
As we have described, if source1 changes, target is created or updated using command1; command2 is used if source2 is modified instead.

Custom suffixes and rules

Make uses a special target, named .SUFFIXES to allow you to define your own suffixes. For example, the dependency line:
.SUFFIXES: .foo .bar
tells make that you will be using these special suffixes to make your own rules.

Similar to how make already knows how to make a .o file from a .c file, you can define rules in the following manner:

.foo.bar:
tr '[A-Z][a-z]' '[N-Z][A-M][n-z][a-m]' < $< > $@
.c.o:
$(CC) $(CFLAGS) -c $<
The first rule allows you to create a .bar file from a .foo file. (Don't worry about what it does, it basically scrambles the file.) The second rule is the default rule used by make to create a .o file from a .c file.

Shortcuts For Make

The make program has many other features which have not been discussed in previous sections. Most important of these features is the macro feature. Macros in make work similarly to macros used in C programming. Make also has its own pre-defined rules which you can take advantage of to make your Makefile smaller.

Macros in make

The make program allows you to use macros, which are similar to variables, to store names of files. The format is as follows:
OBJECTS = data.o io.o main.o
Whenever you want to have make expand these macros out when it runs, type the following corresponding string $(OBJECTS).

Here is our sample Makefile again, using a macro.

OBJECTS = data.o main.o io.o
project1: $(OBJECTS)
cc $(OBJECTS) -o project1
data.o: data.c data.h
cc -c data.c
main.o: data.h io.h main.c
cc -c main.c
io.o: io.h io.c

Special macros

In addition to those macros which you can create yourself, there are a few macros which are used internally by the make program. Here are some of those, listed below:

CC
Contains the current C compiler. Defaults to cc.
CFLAGS
Special options which are added to the built-in C rule. (See next page.)
$@
Full name of the current target.
$?
A list of files for current dependency which are out-of-date.
$<
The source file of the current (single) dependency.
You can also manipulate the way these macros are evaluated, as follows, assuming that OBJS = data.o io.o main.o, using $(OBJS:.o=.c) within the Makefile substitutes .o at the end with .c, giving you the following result: data.c io.c main.c

Predefined rules

By itself, make knows already that in order to create a .o file, it must use cc -c on the corresponding .c file. These rules are built into make, and you can take advantage of this to shorten your Makefile. If you just indicate just the .h files in the dependency line of the Makefile that the current target is dependent on, make will know that the corresponding .c file is already required. You don't even need to include the command for the compiler.

This reduces our Makefile further, as shown:

OBJECTS = data.o main.o io.o
project1: $(OBJECTS)
cc $(OBJECTS) -o project1
data.o: data.h
main.o: data.h io.h

Miscellaneous shortcuts

Although the examples we have shown do not explicitly say so, you can put more than one file in the target section of the dependency rules. If a file appears as a target more than once in a dependency, all of its source files are included as sources for that target.

Here is our sample Makefile again:

CFLAGS = -Aa -D_HPUX_SOURCE OBJECTS = data.o main.o io.o project1: $(OBJECTS) cc $(OBJECTS) -o project1 data.o main.o: data.h io.o main.o: io.h This Makefile shows main.o appearing in two places. Make knows by looking at all the dependencies that main.o depends on both data.h and io.h.

io.o: io.h
One thing to consider, however, is that when you are compiling programs on Wiliki, you may wish to add a CFLAGS macro at the top of your Makefile to enable the compiler to use ANSI standard C compilation. The macro looks like this:
CFLAGS=-Aa -D_HPUX_SOURCE
This will allow make to use ANSI C with the predefined rules.


cc -c io.c
You can also specify a macro's value when running make, as follows:
make 'OBJECTS=data.o newio.o main.o' project1
This overrides the value of OBJECTS in THE Makefile

The Makefile

The Makefile

The previous section described dependencies between files. This section describes the make program in more detail by describing the file it uses, called makefile or Makefile. This file determines the relationships between the source, object and executable files.

Translating the dependency graph

[Image unavailable]

Each dependency shown in the graph is circled with a corresponding color in the Makefile, and each uses the following format:

target : source file(s)
command (must be preceded by a tab)
A target given in the Makefile is a file which will be created or updated when any of its source files are modified. The command(s) given in the subsequent line(s) (which must be preceded by a tab character) are executed in order to create the target file.

Listing dependencies

project1: data.o main.o io.o
cc data.o main.o io.o -o project1
data.o: data.c data.h
cc -c data.c
main.o: data.h io.h main.c
cc -c main.c
io.o: io.h io.c

Using the Makefile with make

Once you have created your Makefile and your corresponding source files, you are ready to use make. If you have named your Makefile either Makefile or makefile, make will recognize it. If you do not wish to call your Makefile one of these names, you can use make -f mymakefile. The order in which dependencies are listed is important. If you simply type make and then return, make will attempt to create or update the first dependency listed.

You can also specify one of the other targets listed in the Makefile, and only that target (and its corresponding source files) would be made. For example, if we typed make, the output of make would look as follows:

% make cc -c data.c cc -c main.c cc -c io.c cc data.o main.o io.o -o project1 % When making its targets, make first checks the source files and attempts to create or update the source files. That is why data.o, main.o and io.o were created before attempting to create the target: project1.

cc -c io.c

Note that in the Makefile shown above, the .h files are listed, but there are no references in their corresponding commands. This is because the .h files are referred within the corresponding .c files through the #include "file.h". If you do not explicitly include these in your Makefile, your program will not be updated if you make a change to your header (.h) files.

Note: Comments can be placed in a Makefile by placing a pound s ign (#) in front of it.



Make File -Dependencies

Dependencies

The principle by which make operates was described to you in the last section. It creates programs according to the file dependencies. For example, we now know that in order to create an object file, program.o, we require at least the file program.c. (There may be other dependencies, such as a .h file.)

This section involves drawing what are called "dependency graphs", which are very similar to the diagrams given in the previous section. As you become proficient using make, you probably will not need to draw these diagrams, but it is important to get a feel for what you are doing.

Dependency graphs

[Image unavailable]

This graph shown in the figure is a program which is made up of 5 source files, called data.c, data.h, io.c, io.h, and main.c. At the top is the final result, a program called project1. The lines which radiate downwards from a file are the other files which it depends on. For example, to create main.o, the three files data.h, io.h, and main.c are needed.

How dependency works

[Image unavailable]

Suppose that you have gone through the process of compiling the program, and while you are testing the program, you realize that one function in io.c has a bug in it. You edit io.c to fix the bug.

The figure above shows io.c outlined in red. By going up the graph, you notice that io.o needs to be updated because io.c has changed. Similarly, because io.o has changed, project1 needs to be updated as well.

How does make do it?

[Image unavailable]

The make program gets its dependency "graph" from a text file called makefile or Makefile which resides in the same directory as the source files. Make checks the modification times of the files, and whenever a file becomes "newer" than something that depends on it, (in other words, modified) it runs the compiler accordingly.

For example, the previous page explained io.c was changed. If you edit io.c, it becomes "newer" than io.o, meaning that make must run cc -c io.c to create a new io.o, then run cc data.o main.o io.o -o project1 for project1.

Make File - Description -

The Make command

The make command allows you to manage large programs or groups of programs. As you begin to write larger programs, you will notice that re-compiling larger programs takes much longer than re-compiling short programs. Moreover, you notice that you usually only work on a small section of the program (such as a single function that you are debugging), and much of the rest of the program remains unchanged.

The make program aids you in developing your large programs by keeping track of which portions of the entire program have been changed, compiling only those parts of the program which have changed since the last compile.

A simple compilation

[Image unavailable]

Compiling a small C program requires at least a single .c file, with .h files as appropriate. Although the command to perform this task is simply cc file.c, there are 3 steps to obtain the final executable program, as shown:

  1. Compiler stage: All C language code in the .c file is converted into a lower-level language called Assembly language; making .s files.
  2. Assembler stage: The assembly language code made by the previous stage is then converted into object code which are fragments of code which the computer understands directly. An object code file ends with .o.
  3. Linker stage: The final stage in compiling a program involves linking the object code to code libraries which contain certain "built-in" functions, such as printf. This stage produces an executable program, which is named a.out by default.

Compiling with several files

[Image unavailable]

When your program becomes very large, it makes sense to divide your source code into separate easily-manageable .c files. The figure above demonstrates the compiling of a program made up of two .c files and a single common.h file. The command is as follows:

cc green.c blue.c
where both .c files are given to the compiler. Note that the first two steps taken in compiling the files are identical to the previous procedure for a single .c file, but the last step has an interesting twist: The two .o files are linked together at the Linker stage to create one executable program, a.out.

Separate compilation

[Image unavailable]

The steps taken in creating the executable program can be divided up in to two compiler/assembler steps circled in red, and one final linker step circled in yellow. The two .o files may be created separately, but both are required at the last step to create the executable program.

You can use the -c option with cc to create the corresponding object (.o) file from a .c file. For example, typing the command: cc -c green.c will not produce an a.out file, but the compiler will stop after the assembler stage, leaving you with a green.o file.

Separate compilation steps

[Image unavailable]

The three different tasks required to produce the executable program are as follows:

  • Compile green.o: cc -c green.c
  • Compile blue.o: cc -c blue.c
  • Link the parts together: cc green.o blue.o
For example, it is important to note that in order to create the file, green.o, the two files, green.c and the header file common.h are required. Similarly, in order to create the executable program, a.out, the object files green.o and blue.o are required.

Splitting your C program

When you separate your C program into many files, keep these points in mind:

  • Be sure no two files have functions with the same name in it. The compiler will get confused.
  • Similarly, if you use global variables in your program, be sure no two files define the same global variables.
  • If you use global variables, be sure only one of the files defines them, and declare them in your .h as follows: extern int globalvar;
  • To use functions from another file, make a .h file with the function prototypes, and use #include to include those .h files within your .c files.
  • At least one of the files must have a main() function.

Note: When you define a variable, it looks like this: int globalvar;. When you declare a variable, it looks like this: extern int globalvar;. The main difference is that a variable definition creates the variable, while a declaration indicates that the variable is defined elsewhere. A definition implies a declaration.

How do I create a library of object files?

DJGPP FAQ -- How do I create a library of object files?

Q: I would like to distribute my package as a library that can be linked into programs, but I'm unsure how to go about it....

A: First, you need to compile all your sources into object .o files, like this:

 gcc -c -Wall -O2 file1.c  gcc -c -Wall -O2 file2.c  gcc -c -Wall -O2 file3.c  ... 

The only GCC switch in this example that's required is -c, the rest are just recommended for better code generation and diagnostics.

Once you have the object files ready, use the ar ("Archiver") utility to create a library, let's call it libacme.a, like this:

 ar rvs libacme.a file1.o file2.o file3.o ... 

The rvs flags tell ar to put named files into the library, replacing any previous versions of these files if necessary, print the names of object files as it puts them into the library, and add an object-file index to the library, which makes it link faster.

If you use RHIDE, you can create a library by specifying a file with a .a extension as the main target in the project (choose Project | Main Target Name and enter a file name such as libacme.a).

The library is now ready to use. The simplest way to force the compiler to use it while linking is to mention its name in the link command line, like this:

 gcc -o myprog.exe myprog.c libacme.a 

This is better than just listing in the command line all the object files in the library, since the latter will cause the linker to link in all the object files, even those which aren't used by the program.

The name of the library which begins with a lib and ends with a .a extension is a convention used for convenience. When the link command line includes an argument -lXXYYZZ, GCC (and all Unix compilers) will look for a file libXXYYZZ.a in every directory they search by default. So, if your library libacme.a is installed in the DJGPP lib subdirectory, the user can instruct GCC to look into it by appending -lacme to the link command line. Other systems might be configured to look for different names when a switch such as -lfoo is mentioned. For example, Linux might look in /usr/lib for files libfoo.so.*, while Alpha/VMS will look for SYS$GNU:[LIBRARIES]FOO.LIB;*. Windows 98, of course, will look for something monstrously long like C:\Windows\Program Files\Vendors\GNU\gcc\libraries\foo.lib. If you don't follow this convention, you will need to type the full name of the library file.

If you need to update a certain object file in a library, use the same command ar rvs library-name object-name as above, but only with the name(s) of the object file(s) you need to replace.

ar is documented in the Binutils docs. To read, type this from the DOS prompt:

 info binutils ar 

How to Create Multi-Module Programs

1. Introduction

This document describes how to construct multi-file programs in C and C++.

The first task is to decide how your program will be divided into multiple files or modules. If your program design contains ADT's then usually each ADT's code should be in it's own file. Other functions should be grouped by their purpose. For example, all procedures that produce graphics might be stored in one file.

The next task is to determine what parts of a module is private (i.e., not needed by other modules) and what parts are public. These parts include functions, types and variables. In general, there should be a good reason for all public objects. (This is known as encapsulation.)

2. Header Files

The public parts of a module must be stored in a header (.h) file. This header is included by files that use its corresponding module and the module itself. This insures that the module and the files that use the module agree on types and functions. Header files typically contain macros, typedef's and function prototypes. In C++, they also contain class definitions, inline code, template definitions and constants. Remember that even when compiling several files, the compiler looks at each file separately. Each file must include headers for all the info that the compiler needs to compile it.

It is also very common for headers to include other headers. This can lead to problems when header x.h and header y.h are included in a file, but both headers also include z.h. So header z.h is included twice. Most of time this will cause a compiler error, because of multiple declarations of a type or macro. There is a solution to this problem using the conditional compilation features of the preprocessor. The following illustrates the solution:

#ifndef UNIQUE_MACRO_NAME
#define UNIQUE_MACRO_NAME

/* body of header */

#endif

The body of the header is only seen by the compiler if UNQIUE_MACRO_NAME is not defined, then it is immediately defined. Thus, the body of the header is only included once, the first time the header is encountered by the compiler. Afterward, the macro is defined which keeps the body from being included again. Of course, each header file must use a different macro!

3. Functions and Global Variables

Functions and global variables by default have external scope. This means that if function f is defined in file x.c, then it is available to use in file y.c or any other file. The linker combines the object files (.o or .obj) into a single executable file. Part of the linker's job is to match up global names that files reference to their definition in another file. However, if f is part of the implementation of module x, then there is no reason that any other file should use f. In this case, f should instead be defined with internal scope. Internal scope means that only the code in x.c can directly access function f. To give a function internal scope, prefix its definition by the reserved word static. This is somewhat confusing, because static has an entirely different meaning when used with local variables of a function. One technique to clarify this process is to use macros with the more meaningful names, PUBLIC and PRIVATE.

#define PRIVATE static
#define PUBLIC

PUBLIC is defined to be nothing because functions have external scope by default. Now to define a function with internal scope, use:

PRIVATE void f()

Of course, private functions are not prototyped in header files.

Global variables work much like functions. To give a global variable internal scope precede its definition by static (or better PRIVATE ). For a file to use a PUBLIC global variable, it must declare it as an extern variable. This is commonly done in the header file. (Global variables should be used sparingly and for good reasons only.)

Only one file has a PUBLIC function named main. This function is where execution begins just as in a single-file program.

The make utility is very useful for maintaining multi-module programs.

4. Do's and Don'ts

  • Never include code in header files! (Actually in C++, the rule is never include non-inline or non-template code in header files!)
  • If using Borland's Project Make to build multi-module programs, never include header files in the Project Make window!
  • If using a command-line compiler (like cc), do not compile the header files!
  • Never include source files!

5. Example

This section shows a multi-module program example:

File: header.h

#ifndef HEADER_EXAMPLE
#define HEADER_EXAMPLE

typedef long int INT32; /* Used in prototype below so must be in header */

void f( INT32 ); /* external scope function prototypes go in header */

#endif

File: sub.c

#include 
#include "header.h" /* " " in include causes compiler to look
for header in the current working dir */

typedef double Real; /* This type only used internally to this file, so
it's not in header file */
static void g( INT32, Real ); /* internal scope function prototypes do not
go in header */

void f( INT32 x )
{
g(x, 2.5);
}

static void g( INT32 x, Real d )
{
printf("%f\n", x/d);
}

File: main.c

#include 
#include "header.h"

int main( void )
{
INT32 x;

printf("Enter a integer: ");
scanf("%ld", &x);
f(x);
return 0;
}

One could compile this on UNIX with the command:

cc main.c sub.c

Note that the header file is not listed!

Using the Emacs Editor

1. Introduction

1.1 What can it do?

The Emacs text editor is a popular and powerful program that is available on many platforms (e.g., UNIX, DOS, Windows 9x and NT, and OS/2). Some of its features include:

  • simpler to use than vi
  • editing and viewing multiple files at one time
  • spell checking
  • compiling programs from within Emacs
  • automatic indention of programs
  • a powerful macro language that may be used to extend the functions of Emacs

1.2 History

Emacs was created by Richard Stallman in 1975. GNU Emacs is the most popular version of Emacs and is directly derived from Stallman's original version. The GNU (Gnu is Not UNIX) project consists of programmers that volunteer their time to develop free software and is associated with Stallman's Free Software Foundation. There are other versions of Emacs that are not free.

2. Basics

2.1 How to start

To run Emacs, simply type at the UNIX prompt:

emacs file_to_edit

Emacs will be run and start editing the file specified. The screen will look something like:


#include 
#include

void tty_info( const char *name )
{
int i;
for( i = 0; i < 3; i++ )
fprintf(stderr,"%s -> fd %d: isatty = %d, pgrp = %d
", name, i,
isatty(i), tcgetpgrp(i));
}

int main( void )
{
pid_t pid1, pid2;
int pipe1[2];
int term_fd;

term_fd = open("/dev/tty", O_RDWR, 0666);
assert(term_fd >= 0);
tty_info("main1");
pipe(pipe1);
-----Emacs: b.c (C)--9%---------------------------------------


The mode line is the one that is next to the last line from the bottom and will be highlighted. This line divides the editing portion of the screen from the command line (or echo area in Emacs jargon) at the very bottom. When a file is loaded into Emacs, it is loaded into a buffer. Buffers are edited, not files. The mode line shows the name of the buffer. To update the actual file, the buffer must be saved. Emacs will prompt the user to save any modified buffers when it exits.

Emacs maintains a backup file when a file is saved. The backup file is a copy of the last version of the file. Its name is the original file name with a tilde (~) appended to the end.

2.2 Emacs commands

The user uses special keys to send commands to Emacs. The CONTROL and META keys are used to distinguish keys used as commands from keys used to enter text into a buffer. The CONTROL key should be familiar. It is used like a shift key. In Emacs notation, C-x means to hold down the CONTROL key while hitting the x key. Many terminals do not have a key labeled META. Such terminals, use the ESC key for the META key. It is used differently than the CONTROL key. The META key is hit and released before the next key is hit. For example, in Emacs notation, M-x means to hit the META key (and release) and then hit the x key.

2.3 Moving the point

The point marks the location of the cursor on the screen. Often Emacs is configured to use the cursor keys to move the point. The following key commands move the point:

C-p
move up one line (useful if cursor keys do not work) (p for previous)
C-n
move down one line (n for next)
C-f
move right one character (f for forward)
C-b
move back one character (b for back)
C-d
delete the character at the point
delete the character before the point (i.e., backspace)
C-a
move to beginning of line
C-e
move to end of line
C-v
move down a screen
M-v
move up a screen
M-<
move to beginning of the buffer
M->
move to end of buffer

Sometimes the backspace key is mapped to return C-h (which is the help key for Emacs see below). Most terminal/telnet programs allow this to be changed so that the backspace key returns .

2.4 Quitting Emacs

Emacs can be exited in two ways. The first way is to type C-x C-c, this command terminates the Emacs process. (Note that this command is activated by two keystrokes and that the CONTROL key is held down for both!) The user will be asked to save any modified buffers.

The second way does not terminate Emacs. The C-z command suspends the Emacs process. A suspended process is not terminated, but a shell prompt appears and the user can enter commands. The user may unsuspend the process and continue editing by typing:

fg

at the shell prompt. Emacs does not ask to save any buffers when suspending.

Which way should be used? It depends. If the user is about to logout, Emacs should be terminated. If the user only wishes to execute a few UNIX commands and then return to editing, suspending Emacs is more convenient.

2.5 Cut and Pasting

To move and copy text in Emacs, the user must first mark the text to copy or move. A region of text is marked by first setting the mark at one end of the region and then moving the point to the other end. The C-@ or C-SPC (SPC is the spacebar) commands set the mark. If the region is to be moved, hit C-w to cut the region. The region will be removed from the buffer and saved. If the region is to be copied, use M-w. This command copies the region, but does not remove it. Then move the point to the position where the text show go and hit C-y. This command yanks the text back.

2.6 Files and buffers

Emacs supports multiple buffers. The user may load files into the buffers, switch between them and copy text from one buffer to another.

C-x C-f
read a file into a buffer
C-x C-s
save current buffer back to file
C-x i
insert contents of file into current buffer
C-x C-w
write buffer to specified file
C-x b
select another buffer
C-x C-b
list all buffers

Text from one buffer can be copied or moved to another buffer. Simply cut (C-w) or copy (M-w) from one buffer, move to the other and paste (C-y).

2.7 Searching and replacing

Emacs provides several ways to search for text in a file. The two most useful commands are:

C-s
search forward
C-r
search backward

When these commands are used, a prompt will appear at the bottom. Then the user types in the text to search for. Emacs does not wait for the entire text to be entered, it starts searching immediately. For example, if one searches for the word write, as soon as the w is hit, Emacs moves to the first occurrence of the letter w. When the r is hit, Emacs moves to the first occurrence of wr and so on. To stop a search, hit ESC. To cancel, type C-g.

The M-% command is most common replace command. It will prompt the user for the string to replace and what to replace it with. It then will scan through the buffer and prompt the user to confirm each replacement. Possible responses are:

SPC
(space bar) replace string, go on to next match
DEL
(delete key) skip without replacing, go on to next match
!
replace all remaining matches
ESC
stop replacing

2.8 Modes of buffers

Emacs supports multiple modes for buffers. The mode of a buffer determines how some commands work in that buffer. For example, files that end in .c or .h are automatically put in C mode. In this mode, Emacs perform several extra functions:

  • When a ), ] or } is typed, Emacs will show the user the matching (, [ or { by moving the point to the matching symbol for a fraction of a second.
  • If the TAB key is hit, Emacs will indent the line to the correct depth. (However, this will only work correctly if the lines above are properly indented.) The user can use this feature to catch some syntax errors. If the line is not indented as the user expects, it may be because of a syntax error in the line (or a line above).

2.9 Miscellaneous

Here are some useful commands:

C-k
erases to end of line. If line is blank, the line itself is erased.
C-g
aborts a partially typed command.
C-x u
undo last change to buffer.
C-l
redraw screen
C-h
starts Emacs on-line help
M-$
check spelling of word at point

3. Advanced Commands

3.1 Viewing multiple buffers

It can be very useful in some circumstances to look at two different buffers when editing. The following commands are used:

C-x 2
split the screen to show two buffers
C-x 1
go back to one buffer on screen
C-x o
switch cursor to other buffer

3.2 Commands and key bindings

Emacs supports many different commands. Each command has a name that uniquely identifies it. Some of the most common commands are also bound to keys. For example, the C-x 2 key sequence is bound to the split-window-vertically Emacs command. The association of a key sequence with a command is called a key binding. Emacs allows the user to redefine (and create new) key bindings; however, this is beyond the scope of this document.

For the commands with no key binding, the command must be invoked by name using the execute-command command that is bound to M-x. (Of course, M-x can be used to execute any command.)

3.3 Compiling from within Emacs

Emacs allows programmers to compile code without leaving Emacs. In addition, Emacs can often parse any error messages generated and automatically move to the offending line in the source code. This can be very convenient!

To compile your code, use the compile command (i.e., type M-x compile). The following prompt will appear on the bottom line:

Compile command: make -k

This says that Emacs will use the make -k UNIX command to compile your code. This will only work if the programmer has constructed a makefile that is used by the make command. If a makefile has not be constructed, the programmer must replace make -k with the appropriate command. It is possible to edit text in prompts on the bottom line (in fact, the prompt is called the mini-buffer). For example, to compile a C program named a.c, erase the make -k, replace in with cc a.c and hit Return. If there are modified buffers that have not yet been saved back to their files, Emacs will ask the programmer if the changes should be saved. Remember the compiler will compile what is saved to the file, not what is in Emac's buffers!

The compile command will split the screen in two buffers. The new buffer will show the result of compiling the code. If there are errors, they will be displayed in the buffer. To have Emacs look up the line that each error occurs on, use the C-x ` command. (The last character is a backquote.) The first time this command is used, it will show where the first error is. The next time it will show where the second error is, etc. For this to work, the compiler must print out its error messages in a format that Emacs understands. Almost all C and C++ compilers output error messages in the correct format.

Common C Programming Errors

1. Introduction

This document lists the common C programming errors that the author sees time and time again. Solutions to the errors are also presented.

Another great resource is the C FAQ. Gimpel Software also has a list of hard to detect C/C++ bugs that might be useful.

2. Beginner Errors

These are errors that beginning C students often make. However, the professionals still sometimes make them too!

2.1 Forgetting to put a break in a switch statement

Remember that C does not break out of a switch statement if a case is encountered. For example:

int x = 2;
switch(x) {
case 2:
printf("Two\n");
case 3:
printf("Three\n");
}

prints out:

Two
Three

Put a break to break out of the switch:

int x = 2;
switch(x) {
case 2:
printf("Two\n");
break;
case 3:
printf("Three\n");
break; /* not necessary, but good if additional cases are added later */
}

2.2 Using = instead of ==

C's = operator is used exclusively for assignment and returns the value assigned. The == operator is used exclusively for comparison and returns an integer value (0 for false, not 0 for true). Because of these return values, the C compiler often does not flag an error when = is used when one really wanted an ==. For example:

int x = 5;
if ( x = 6 )
printf("x equals 6\n");

This code prints out x equals 6! Why? The assignment inside the if sets x to 6 and returns the value 6 to the if. Since 6 is not 0, this is interpreted as true.

One way to have the compiler find this type of error is to put any constants (or any r-value expressions) on the left side. Then if an = is used, it will be an error:
if ( 6 = x)

2.3 scanf() errors

There are two types of common scanf() errors:

2.3.1 Forgetting to put an ampersand (&) on arguments

scanf() must have the address of the variable to store input into. This means that often the ampersand address operator is required to compute the addresses. Here's an example:

int x;
char * st = malloc(31);

scanf("%d", &x); /* & required to pass address to scanf() */
scanf("%30s", st); /* NO & here, st itself points to variable! */

As the last line above shows, sometimes no ampersand is correct!

2.3.2 Using the wrong format for operand

C compilers do not check that the correct format is used for arguments of a scanf() call. The most common errors are using the %f format for doubles (which must use the %lf format) and mixing up %c and %s for characters and strings.

2.4 Size of arrays

Arrays in C always start at index 0. This means that an array of 10 integers defined as:

int a[10];

has valid indices from 0 to 9 not 10! It is very common for students go one too far in an array. This can lead to unpredictable behavior of the program.

2.5 Integer division

Unlike Pascal, C uses the / operator for both real and integer division. It is important to understand how C determines which it will do. If both operands are of an integal type, integer division is used, else real division is used. For example:

double half = 1/2;

This code sets half to 0 not 0.5! Why? Because 1 and 2 are integer constants. To fix this, change at least one of them to a real constant.

double half = 1.0/2;

If both operands are integer variables and real division is desired, cast one of the variables to double (or float).

int x = 5, y = 2;
double d = ((double) x)/y;

2.6 Loop errors

In C, a loop repeats the very next statement after the loop statement. The code:

int x = 5;
while( x > 0 );
x--;

is an infinite loop. Why? The semicolon after the while defines the statement to repeat as the null statement (which does nothing). Remove the semicolon and the loop works as expected.

Another common loop error is to iterate one too many times or one too few. Check loop conditions carefully!

2.7 Not using prototypes

Prototypes tell the compiler important features of a function: the return type and the parameters of the function. If no prototype is given, the compiler assumes that the function returns an int and can take any number of parameters of any type.

One important reason to use prototypes is to let the compiler check for errors in the argument lists of function calls. However, a prototype must be used if the function does not return an int. For example, the sqrt() function returns a double, not an int. The following code:

double x = sqrt(2);

will not work correctly if a prototype:

double sqrt(double);

does not appear above it. Why? Without a prototype, the C compiler assumes that sqrt() returns an int. Since the returned value is stored in a double variable, the compiler inserts code to convert the value to a double. This conversion is not needed and will result in the wrong value.

The solution to this problem is to include the correct C header file that contains the sqrt() prototype, math.h. For functions you write, you must either place the prototype at the top of the source file or create a header file and include it.

2.8 Not initializing pointers

Anytime you use a pointer, you should be able to answer the question: What variable does this point to? If you can not answer this question, it is likely it doesn't point to any variable. This type of error will often result in a Segmentation fault/coredump error on UNIX/Linux or a general protection fault under Windows. (Under good old DOS (ugh!), anything could happen!)

Here's an example of this type of error.

#include 
int main()
{
char * st; /* defines a pointer to a char or char array */

strcpy(st, "abc"); /* what char array does st point to?? */
return 0;
}

How to do this correctly? Either use an array or dynamically allocate an array.

#include 
int main()
{
char st[20]; /* defines an char array */

strcpy(st, "abc"); /* st points to char array */
return 0;
}

or

#include 
#include
int main()
{
char *st = malloc(20); /* st points to allocated array*/

strcpy(st, "abc"); /* st points to char array */
free(st); /* don't forget to deallocate when done! */
return 0;
}

Actually, the first solution is much preferred for what this code does. Why? Dynamical allocation should only be used when it is required. It is slower and more error prone than just defining a normal array.

3. String Errors

3.1 Confusing character and string constants

C considers character and string constants as very different things. Character constants are enclosed in single quotes and string constants are enclosed in double quotes. String constants act as a pointer to the actually string. Consider the following code:

char ch = 'A';     /* correct */
char ch = "A"; /* error */

The second line assigns the character variable ch to the address of a string constant. This should generate a compiler error. The same should happen if a string pointer is assigned to a character constant:

const char * st = "A";     /* correct */
const char * st = 'A'; /* error */

3.2 Comparing strings with ==

Never use the == operator to compare the value of strings! Strings are char arrays. The name of a char array acts like a pointer to the string (just like other types of arrays in C). So what? Consider the following code:

char st1[] = "abc";
char st2[] = "abc";
if ( st1 == st2 )
printf("Yes");
else
printf("No");

This code prints out No. Why? Because the == operator is comparing the pointer values of st1 and st2, not the data pointed to by them. The correct way to compare string values is to use the strcmp() library function. (Be sure to include string.h) If the if statement above is replaced with the following:

if ( strcmp(st1,st2) == 0 )
printf("Yes");
else
printf("No");

the code will print out Yes. For similar reasons, don't use the other relational operators (<,>, etc.) with strings either. Use strcmp() here too.

3.3 Not null terminating strings

C assumes that a string is a character array with a terminating null character. This null character has ASCII value 0 and can be represented as just 0 or '\0'. This value is used to mark the end of meaningful data in the string. If this value is missing, many C string functions will keep processing data past the end of the meaningful data and often past the end of the character array itself until it happens to find a zero byte in memory!

Most C library string functions that create strings will always properly null terminate them. Some do not (e.g., strncpy() ). Be sure to read their descriptions carefully.

3.4 Not leaving room for the null terminator

A C string must have a null terminator at the end of the meaningful data in the string. A common mistake is to not allocate room for this extra character. For example, the string defined below

char str[30];

only has room for only 29 (not 30) actually data characters, since a null must appear after the last data character.

This can also be a problem with dynamic allocation. Below is the correct way to allocate a string to the exact size needed to hold a copy of another.

char * copy_str = malloc( strlen(orig_str) + 1);
strcpy(copy_str, orig_str);

The common mistake is to forget to add one to the return value of strlen(). The strlen() function returns a count of the data characters which does not include the null terminator.

This type of error can be very hard to detect. It might not cause any problems or only problems in extreme cases. In the case of dynamic allocation, it might corrupt the heap (the area of the program's memory used for dynamic allocation) and cause the next heap operation (malloc(), free(), etc.) to fail.

4. Input/Output Errors

4.1 Using fgetc(), etc. incorrectly

The fgetc(), getc() and getchar() functions all return back an integer value. For example, the prototype of fgetc() is:

int fgetc( FILE * );

Sometimes this integer value is really a simple character, but there is one very important case where the return value is not a character!

What is this value? EOF A common misconception of students is that files have a special EOF character at the end. There is no special character stored at the end of a file. EOF is an integer error code returned by a function. Here is the wrong way to use fgetc():

int count_line_size( FILE * fp )
{
char ch;
int cnt = 0;

while( (ch = fgetc(fp)) != EOF && ch != '\n')
cnt++;
return cnt;
}

What is wrong with this? The problem occurs in the condition of the while loop. To illustrate, here is the loop rewritten to show what C will do behind the scenes.

while( (int) ( ch = (char) fgetc(fp) ) != EOF && ch != '\n')
cnt++;

The return value of fgetc(fp) is cast to char to store the result into ch. Then the value of ch must be cast back to an int to compare it with EOF. So what? Casting an int value to a char and then back to an int may not give back the original int value. This means in the example above that if fgetc() returns back the EOF value, the casting may change the value so that the comparison later with EOF would be false.

What is the solution? Make the ch variable an int as below:

int count_line_size( FILE * fp )
{
int ch;
int cnt = 0;

while( (ch = fgetc(fp)) != EOF && ch != '\n')
cnt++;
return cnt;
}

Now the only hidden cast is in the second comparison.

while( (ch = fgetc(fp)) != EOF &&  ch != ((int) '\n') )
cnt++;

This cast has no harmful effects at all! So, the moral of all this is: always use an int variable to store the result of the fgetc(), getc() and getchar().

4.2 Using feof() incorrectly

There is a wide spread misunderstanding of how C's feof() function works. Many programmers use it like Pascal's eof() function. However, C's function works differently!

What's the difference? Pascal's function returns true if the next read will fail because of end of file. C's function returns true if the last function failed. Here's an example of a misuse of feof():

#include 
int main()
{
FILE * fp = fopen("test.txt", "r");
char line[100];

while( ! feof(fp) ) {
fgets(line, sizeof(line), fp);
fputs(line, stdout);
}
fclose(fp);
return 0;
}

This program will print out the last line of the input file twice. Why? After the last line is read in and printed out, feof() will still return 0 (false) and the loop will continue. The next fgets() fails and so the line variable holding the contents of the last line is not changed and is printed out again. After this, feof() will return true (since fgets() failed) and the loop ends.

How should this fixed? One way is the following:

#include 
int main()
{
FILE * fp = fopen("test.txt", "r");
char line[100];

while( 1 ) {
fgets(line, sizeof(line), fp);
if ( feof(fp) ) /* check for EOF right after fgets() */
break;

fputs(line, stdout);
}
fclose(fp);
return 0;
}

However, this is not the best way. There is really no reason to use feof() at all. C input functions return values that can be used to check for EOF. For example, fgets returns the NULL pointer on EOF. Here's a better version of the program:

#include 
int main()
{
FILE * fp = fopen("test.txt", "r");
char line[100];

while( fgets(line, sizeof(line), fp) != NULL )
fputs(line, stdout);
fclose(fp);
return 0;
}

The author has yet to see any student use the feof() function correctly!

Incidently, this discussion also applies to C++ and Java. The eof() method of an istream works just like C's feof().

4.3 Leaving characters in the input buffer

C input (and output) functions buffer data. Buffering stores data in memory and only reads (or writes) the data from (or to) I/O devices when needed. Reading and writing data in big chunks is much more efficient than a byte (or character) at a time. Often the buffering has no effect on programming.

One place where buffering is visible is input using scanf(). The keyboard is usually line buffered. This means that each line input is stored in a buffer. Problems can arise when a program does not process all the data in a line, before it wants to process the next line of input. For example, consider the following code:

int x;
char st[31];

printf("Enter an integer: ");
scanf("%d", &x);
printf("Enter a line of text: ");
fgets(st, 31, stdin);

The fgets() will not read the line of text that is typed in. Instead, it will probably just read an empty line. In fact, the program will not even wait for an input for the fgets() call. Why? The scanf() call reads the characters needed that represent the integer number read in, but it leaves the '\n' in the input buffer. The fgets() then starts reading data from the input buffer. It finds a '\n' and stops without needing any additional keyboard input.

What's the solution? One simple method is to read and dump all the characters from the input buffer until a '\n' after the scanf() call. Since this is something that might be used in lots of places, it makes sense to make this a function. Here is a function that does just this:

/* function dump_line
* This function reads and dumps any remaining characters on the current input
* line of a file.
* Parameter:
* fp - pointer to a FILE to read characters from
* Precondition:
* fp points to a open file
* Postcondition:
* the file referenced by fp is positioned at the end of the next line
* or the end of the file.
*/
void dump_line( FILE * fp )
{
int ch;

while( (ch = fgetc(fp)) != EOF && ch != '\n' )
/* null body */;
}

Here is the code above fixed by using the above function:

int x;
char st[31];

printf("Enter an integer: ");
scanf("%d", &x);
dump_line(stdin);
printf("Enter a line of text: ");
fgets(st, 31, stdin);

One incorrect solution is to use the following:

fflush(stdin);

This will compile but its behavior is undefined by the ANSI C standard. The fflush() function is only meant to be used on streams open for output, not input. This method does seem to work with some C compilers, but is completely unportable! Thus, it should not be used.

4.4 Using the gets() function

Do not use this function! It does not know how many characters can be safely stored in the string passed to it. Thus, if too many are read, memory will be corrupted. Many security bugs that have been exploited on the Internet use this fact! Use the fgets() function instead (and read from stdin). But remember that unlike gets(), fgets() does not discard a terminating \n from the input.

The scanf() functions can also be used dangerously. The %s format can overwrite the destination string. However, it can be used safely by specifying a width. For example, the format %20s will not read more than 20 characters.

How To Debug Programs

This document explains how to write computer programs that work and that are understandable to other intelligent beings! This two attributes are not independent! In general, programs that other programmers can not understand do not work very well. (Not to mention the fact that they are maintenance nightmares!)

Writing structured programs (structured code and data!) helps greatly in debugging the code. Here is a quick review of some of the features of a structured program.

  1. Lots of well-defined functions!
  2. Using structured loop constructs (i.e., while and for) instead of goto.
  3. Using variables that have one purpose and meaningful names.
  4. Using structured data types to represent complex data.
  5. Using the ADT (Abstract Data Type) or OOP (Object-Oriented Programming) paradigm of programming

1. How to Start

The most common types of mistakes when programming are:

  1. Programming without thinking
  2. Writing code in an unstructured manner

Let's take these in order

1.1 Thinking about Programming

When a real programmer (or programming team) is given a problem to solve, they do not immediately sit down at a terminal and start typing in code! They first design the program by thinking about the numerous ways the problem's solution may be found.

One of the biggest myths of programming is that: The sooner I start coding the sooner a working program will be produced. This is NOT true!! A program that is planned before coding will become a working program before an unplanned program. Yes, an unplanned program will be typed in and maybe compiled faster, but these are just the first steps of creating a working program! Next comes the debugging stage. This is where the benefits of a planned program will appear. In the vast majority of the time, a planned program will have less bugs than an unplanned program. In addition, planned programs are generally more structured than unplanned programs. Thus, finding the bugs will be easier in the planned program.

So how does one design a program before coding? There are several different techniques. One of the most common is called top-down design. Here an outline of the program is first created. (Essentially, one first looks at the general form of the main() function and then recursively works down to the lowest level functions.) There are many references on how to write programs in this manner.

Top-down design divides the program into sub-tasks. Each sub-task is a smaller problem that must be solved. Problems are solved by using an algorithm. Functions (and data) in the program implement the algorithm. Before writing the actual code, take a very small problem and trace by hand how the your chosen algorithm would solve it. This serves several purposes:

  1. It checks out the algorithm to see if will actually work on the given problem. (If it does not work, you can immediately start looking for another algorithm. Note that if you had immediately starting coding you would probably not discover the algorithm would not work until many lines of code had been entered!)
  2. Makes sure that you understand how the algorithm actually works! (If you can not trace the algorithm by hand, you will not be able to write a program to do it!)
  3. Gives you the detail workings of a short, simple run of the algorithm that can be used later when debugging the code.

Only when you are confident that you understand how the entire program will look should you start typing in code.

1.2 Structured Programming

When a program is structured, it is divided into sub-units that may be tested separately. This is a great advantage when debugging! Code in a sub-unit may be debugged separately from the rest of the program code. I find it useful to debug each sub-unit as it is written. If no debugging is performed until the entire program is written, debugging is much harder. The entire source of the program must be searched for bugs. If sub-units are debugged separately, the source code that must be searched for bugs is much smaller! Storing the sub-units of a program into separate source files can make it easier to debug them separately.

The ADT and OOP paradigms also divide programs into sub-units. Often with these methods, the sub-units are even more independent than with normal structured code. This has two main advantages:

  1. Sub-units are even easier to debug separately.
  2. Sub-units can often be reused in other programs. (Thus, a new program can use a previously debugged sub-unit from an earlier program!)

Sub-units are generally debugged separately by writing a small driver program. Driver programs set up data for the sub-task that the sub-unit is supposed to solve, calls the sub-unit to perform his sub-task, and then displays the results so that they can be checked.

Of course, all the following debugging methods can be used to debug a sub-unit, just as they can be used to debug the entire program. Again, the advantage of sub-units are that they are only part of the program and so are easier to debug than the entire program at once!

2. Compiling Programs

The first step after typing in a program (or just a sub-unit of a program) is to compile the program. The compiling process converts the source code file you typed in into machine language and is stored in an object file. This is known as compiling. With most systems, the object file is automatically linked with the system libraries. These libraries contain the code for the functions that are part of the languages library. (For example, the C libraries contain the code for the printf() function.)

2.1 Compiler Errors

Every language has syntax rules. These rules determine which statements are legal in the language and which are not. Compiler programs are designed to enforce these rules. When a rule is broken, the compiler prints an error message and an object file is not created. Most compilers will continue scanning the source file after an error and report other errors it finds. However, once an error has been found, the compiler made be confused by later perfectly legal statements and report them as errors. This brings us to the first rule of compiler errors:

First Rule of Compiler Errors
The first listed compiler error is always a true error; however, later errors may not be true errors.

Succeeding errors may disappear when the first error is removed. So, if later error messages are puzzling, ignore them. Fix the errors that you are sure are errors and re-compile. The puzzling errors may magically disappear when the other true errors are removed.

If you have many errors, the first error may scroll off the screen. One solution to this problem is to save the errors into a file using redirection. One problem is that errors are written to stderr not stdout which the > redirection operator uses. To redirect output to stderr use the 2> operator. Here's an example:

$cc x.c 2>errors
$more errors

The compiler is just a program that other humans created. Often the error messages it displays are confusing (or just plain wrong!). Do not assume that the line that an error message refers to is always the line where the true error actually resides. The compiler scans source files from the top sequentially to the bottom. Sometimes an error is not detected by the compiler until many lines below where the actual error is. Often the compiler is too stupid to realize this and refers to the line in the source file where it realized something is wrong. The true error is earlier in the code. This brings us to the second rule of compiler errors:

Second Rule of Compiler Errors
A compiler error may be caused by any source code line above the line referred to by the compiler; however, it can not be caused by a line below.

In C (and C++), do not forget that the #include preprocessor statement inserts the code of a header file into the source file. An error in the header file, may cause a compiler error referencing a line in the main source file. Most systems allow the preprocessed code (that the C compiler actually compiles!) to be stored in a file. This allows you to see exactly what is being compiled. This file will also show how each C macro was expanded. This can be very helpful to discover the cause of normally very hard to find errors.

A useful technique for finding the cause of puzzling compiler errors is to delete (or comment out) preceding sections of code until the error disappears. When the error disappears, the last section removed must have caused the error.

The compiler can also display warnings. A warning is not a syntax error; however, it may be a logical error in the program. It marks a statement in your program that is legal, but is suspicious. You should treat warnings as errors unless you understand why the warning was generated. Often compilers can be set to different warning levels. It is to your advantage to set this level as high as possible, to have the compiler give as many warnings as possible. Look at these warnings very carefully!.

Remember that just because a program compiles with no errors or warnings does not mean that the program is correct! It only means that every line of the program is syntactically correct. That is, the compiler understands what each statement says to do. The program may still have many logical errors! An English paper may be grammatically correct (i.e., have nouns, verbs, etc. in the correct places), but be gibberish.

2.2 Linker Errors

The linker is a program that links object files (which contain the compiled machine code from a single source file) and libraries (which are files that are collections of object files) together to create an executable program. The linker matches up functions and global variables used in object files to their definitions in other object files. The linker uses the name (often the term symbol is used) of the function or global variable to perform the match.

The most common type of linker error is an unresolved symbol or name. This error occurs when a function or global variable is used, but the linker can not find a match for the name. For example, on an IBM AIX system, the error message looks like this:

0706-317 ERROR: Unresolved or undefined symbols detected:
Symbols in error (followed by references) are
dumped to the load map.
The -bloadmap: option will create a load map.
.fun

This message means that a function (or global variable) named fun (ignore the period) was referenced in the program, but never defined. There are two common causes of these errors:

Misspelling the name of the function
In the example, above there was a function named func. This is not a compiler error. Code in one source file can use functions defined in another. The compiler assumes that any function referenced, but not defined in the file that references it, will be defined in another file and linked. It is only at the link stage that this assumption can be checked. (Note that C++ compilers will usually generate compiler errors for this, since C++ requires prototypes for all referenced functions!)
The correct libraries or object files where not linked
The linker must know what libraries and object files are needed to form the executable program. The standard C libraries are automatically linked. UNIX systems, like the AIX system, do not automatically link in the standard C math library! To link in the math library on the AIX system, use the -lm flag on the compile command. For example, to compile a C program that uses sqrt, type:
cc prog.c -lm
Remember that the #include statement only inserts text into source files. It is a common myth that it also links in the appropriate library! The linker never sees this statement!

There are also bugs related to the linker. One difficult bug to uncover occurs when there are two definitions of a function or global variable. The linker will pick the first definition it finds and ignores the other. Some linkers will display a warning message when this occurs (The AIX linker does not!)

Another bug related to linking occurs when a function is called with the wrong arguments. The linker only looks at the name of the function when matching. It does no argument checking. Here's an example:

File: x.c

int f( int x, int y)
{
return x + y;
}

File: y.c

int main()
{
int s = f(3);
return 0;
}

These types of bugs can be prevented by using prototypes. For example, if the prototype:

int f( int, int);

is added to y.c the compiler will catch this error. Actually, the best idea is to put the prototype in a header file and include it in both x.c and y.c. Why use a header file? So that there is only one instance of the prototype that all files use. If a separate instance is typed into each source file, there is no guarantee that each instance is the same. If there is only one instance, it can not be inconsistent with itself! Why include it in x.c (the file the function is defined in)? So that the compiler can check the prototype and ensure that it is consistent with the function's definition. (Additional note: C++ uses a technique called name mangling to catch these type of errors.)

3. Runtime Errors

A runtime error occurs when the program is running and usually results in the program aborting. On a UNIX/Linux system, an aborting program creates a coredump. A coredump is a binary file named core that contains information about the state of program when it aborted. Debuggers like gdb and dbx can read this file and tell you useful information about what the program was doing when it aborted. There are several types of runtime errors:

Illegal memory access
This is probably the most common type of error. Under UNIX/Linux, the program will coredump with the message Segmentation fault(coredump).
Using Win95 or NT, programs will also abort. However, traditional DOS does not check for illegal memory accesses; the program continues running, but the results are unpredictable. The DOS Borland/Turbo C/C++ compilers will check for data written to the NULL address. However, the error message
NULL pointer assignment
is not displayed until the program terminates.
Division by zero
All operating systems detect this error and abort the program.

4. Debugging Tools

Many methods of debugging a program compare the program's behavior with the correct behavior in great detail. Usually the normal output of the program does not show the detail needed. Debugging tools allow you to examine the behavior of the in more detail.

4.1 The assert Macro

The assert macro is a quick and easy way to put debugging tests into a C/C++ program. To use this macro, you must include the assert.h header file near the top of your source file. Using assert is simple. The format is:

assert(boolean (or int) expression);

If the boolean expression evaluates to true (i.e., not zero), the assert does nothing. However, if it evaluates to false (zero), assert prints an error message and aborts the program. As an example, consider the following assert:

assert( x != 0 );

If x is zero, the following will be displayed:

Assertion failed: x != 0, file err.c, line 6
Abnormal program termination

and the program will abort. Notice that the actual assertion, the name of the file and the line number in the file are displayed.

The assert macro is very useful for putting sanity checks into programs. These are conditions that should always be true if the program is working correctly. It should not be used for user error checking (such as when the file a user requested to read does not exist). Normal if statements should be used for these runtime errors.

Of course, in a commercial program, an assertion failure is not particular helpful to an end user. Also, checking assertions will make the program run at least a little slower than without them. Fortunately, it is easy to disable the assert macro without even removing it. If the macro NDEBUG is defined (above the statement that includes assert.h!), the assert macro does absolutely nothing. If the assertions need to be enabled later, just remove the line that defines NDEBUG. (If this technique is used, be sure that the assert statements do not execute code needed for the program to run correctly. If NDEBUG is defined, the code would not be run!)

4.2 Print Statements

This time honored method of debugging involves inserting debugging print statements liberally in your program. The print statements should be designed to show both what code the program is executing and what values critical variables have.

4.3 Debuggers

The previous method of debugging by adding print statements has two disadvantages:

  1. When new print statements are added, program must be recompiled.
  2. Information output is fixed and can not be changed as program is running.

Source-level debuggers provide a much easier way to trace the execution of programs. They allow one to:

  1. Look at the value of any variable as the program is running.
  2. Pause execution when program reaches any desired statement. (This position in the program is called a breakpoint).
  3. Single step statement by statement through a program.

I strongly recommend that you learn to use the debugger for whatever system you program on. A debugger can save lots of time when debugging your program!

4.4 Lint

The lint program checks C programs for a common list of bugs. It scans your C source code and prints out a list of possible problems. Be warned that lint is very picky! For example, the line:

printf("Hello, World ");

will produce a warning message because printf returns an integer value that is not stored. The return value of printf is often ignored, but lint still produces an warning. There are several ways to make lint happy with this statement, one is:

(void) printf("Hello, World ");

This says to ignore the return value.

4.5 Walk through

A walk through is a process of hand checking the logic of a program. The programmer sits down with someone else (best if another programmer, but anybody will do) and walks through the program for an example case. Often it is the programmer himself who finds the bug in the process of explaining how the program is supposed to work and carefully looking at his code. However, it is easy for the programmer to "know" what the program should be doing and remain blind to what the program is actually doing.

Students need to be very careful using this approach with other students. Two students in the same class should not walk through a program together.

5. General Tips

Here are some general tips for debugging programs.

5.1 Finding Bugs

Before bugs are removed they must be discovered!

  • Aggressively test programs!
  • Start with small problems that can be easily checked by hand. (You should already have one of these worked out from the planning stage!)
  • Test every feature of the program at least once! And is once really enough? Test features in different ways if possible.
  • Do not forget to test trivial problems.
  • Do not make invalid assumptions about input data.

5.2 Determining the Causes of Bugs

A bug can only be caused by the code in the program that has already executed. Be sure you do not waste time searching through code that has not run yet. A debugger or print statements can be used to determine which code has executed and which has not.

Do not fix bugs by mindlessly changing code until it seems to work. You need to figure out why one statement does work and another does not. You should have a good reason for every line of code. "It doesn't work without this line" is not a good reason!

 
# #