What is the use of the poll(file, polltable) API ?
Applications that use nonblocking I/O often use the poll and epoll system calls as well. poll, and epoll have essentially the same functionality: each allow a process to determine whether it can read from or write to one or more open files without blocking. These calls can also block a process until any of a given set of file descriptors becomes available for reading or writing. Therefore, they are often used in applications that must use multiple input or output streams without getting stuck on any one of them. The same functionality is offered by multiple functions, because two were implemented in Unix almost at the same time by two different groups: whereas poll was the System V solution. The epoll call was added in 2.5.45 as a way of making the polling function scale to thousands of file descriptors.
Actually, epoll is a set of three calls that together can be used to achieve the polling functionality. For our purposes, though, we can think of it as a single call.
Support for any of these calls requires support from the device driver. This support (for all calls) is provided through the driver's poll method. This method has the following prototype:
unsigned int (*poll) (struct file *filp, poll_table *wait);
The driver method is called whenever the user-space program performs a poll or epoll system call involving a file descriptor associated with the driver. The device method is in charge of these two steps:
1. Call poll_wait on one or more wait queues that could indicate a change in the poll status. If no file descriptors are currently available for I/O, the kernel causes the process to wait on the wait queues for all file descriptors passed to the system call.
2.Return a bit mask describing the operations (if any) that could be immediately performed without blocking.
Both of these operations are usually straightforward and tend to look very similar from one driver to the next. They rely, however, on information that only the driver can provide and, therefore, must be implemented individually by each driver.
The poll_table structure, the second argument to the poll method, is used within the kernel to implement the poll, and epoll calls; it is declared in
void poll_wait (struct file *, wait_queue_head_t *, poll_table *);
The second task performed by the poll method is returning the bit mask describing which operations could be completed immediately; this is also straightforward. For example, if the device has data available, a read would complete without sleeping; the poll method should indicate this state of affairs. Several flags (defined via
This bit must be set if the device can be read without blocking.
This bit must be set if "normal" data is available for reading. A readable device returns (POLLIN | POLLRDNORM).
This bit indicates that out-of-band data is available for reading from the device. It is currently used only in one place in the Linux kernel (the DECnet code) and is not generally applicable to device drivers.
High-priority data (out-of-band) can be read without blocking. This bit causes select to report that an exception condition occurred on the file, because select reports out-of-band data as an exception condition.
When a process reading this device sees end-of-file, the driver must set POLLHUP (hang-up). A process calling select is told that the device is readable, as dictated by the select functionality.
An error condition has occurred on the device. When poll is invoked, the device is reported as both readable and writable, since both read and write return an error code without blocking.
This bit is set in the return value if the device can be written to without blocking.
This bit has the same meaning as POLLOUT, and sometimes it actually is the same number. A writable device returns (POLLOUT | POLLWRNORM).
Like POLLRDBAND, this bit means that data with nonzero priority can be written to the device. Only the datagram implementation of poll uses this bit, since a datagram can transmit out-of-band data.
It's worth repeating that POLLRDBAND and POLLWRBAND are meaningful only with file descriptors associated with sockets: device drivers won't normally use these flags.
Interaction with read and write :-
The purpose of the poll and select calls is to determine in advance if an I/O operation will block. In that respect, they complement read and write. More important, poll and select are useful, because they let the application wait simultaneously for several data streams.
A correct implementation of the two calls is essential to make applications work correctly: although the following rules have more or less already been stated, we summarize them here.
Reading data from the device
1.If there is data in the input buffer, the read call should return immediately, with no noticeable delay, even if less data is available than the application requested, and the driver is sure the remaining data will arrive soon. You can always return less data than you're asked for if this is convenient for any reason, provided you return at least one byte. In this case, poll should return POLLIN|POLLRDNORM.
2.If there is no data in the input buffer, by default read must block until at least one byte is there. If O_NONBLOCK is set, on the other hand, read returns immediately with a return value of -EAGAIN (although some old versions of System V return 0 in this case). In these cases, poll must report that the device is unreadable until at least one byte arrives. As soon as there is some data in the buffer, we fall back to the previous case.
3. If we are at end-of-file, read should return immediately with a return value of 0, independent of O_NONBLOCK. poll should report POLLHUP in this case.
Writing to the device :-
1. If there is space in the output buffer, write should return without delay. It can accept less data than the call requested, but it must accept at least one byte. In this case, poll reports that the device is writable by returning POLLOUT|POLLWRNORM.
2. If the output buffer is full, by default write blocks until some space is freed. If O_NONBLOCK is set, write returns immediately with a return value of -EAGAIN (older System V Unices returned 0). In these cases, poll should report that the file is not writable. If, on the other hand, the device is not able to accept any more data, write returns -ENOSPC ("No space left on device"), independently of the setting of O_NONBLOCK.
3. Never make a write call wait for data transmission before returning, even if O_NONBLOCK is clear. This is because many applications use select to find out whether a write will block. If the device is reported as writable, the call must not block. If the program using the device wants to ensure that the data it enqueues in the output buffer is actually transmitted, the driver must provide an fsync method. For instance, a removable device should have an fsync entry point.
Although this is a good set of general rules, one should also recognize that each device is unique and that sometimes the rules must be bent slightly. For example, record-oriented devices (such as tape drives) cannot execute partial writes.