Monday, 18 February 2013

Offset of File in C and JAVA

1. What is offset?[1]
    In computer science, an offset within an array or other data structure object is an integer indicating the distance (displacement) from the beginning of the object up until a give element or point, presumably within the same object. The concept of a distance is valid only if all elements of the object are the same size (typically given in bytes or words). For example, given an array of characters A, containing abcdef one cay say that the element containing the letter 'c' has an offset of 2 from the start of A. 

2. What is offset in a file?[2]
    Every open file has an associated "current file offset", normally a non-negative integer that measures the number of bytes from the beginning of the file. It is possible, however that certain devices could allow negative offsets. But for regular files, the offset must be non-negative.
    Read and write operations normally start at the current file offset and cause the offset to be incremented by the number of bytes read or written. By default, this offset is initialized to 0 when a file is open (In C, when a file is open with O_APPEND option, the offset changes)

3. What is a hole in file?[2]
    The file's offset can be greater than the file's current size, in which case the next write to the file will extend the file. This is referred to as creating a hole in a file and is allowed. Any bytes in a file that have not been written are read back as 0. The figure below shows a hole in a file. The current file size is 4 Byte. The file contains characters 'a' 'b' 'c' 'd' at the beginning. The current offset is 10 Byte. This means if you write another 4 characters, say 'A' 'B' 'C' 'D'  to the file. The writing address starts at 0x0A ( Because the current offset is 10 Byte), after this writing, the offset would be 14 Byte.
Figure 1: A hole in a file

4. The offset and lseek function in C [2]
     In C language, an open file's offset can be set explicitly by calling lseek.
Figure 2: lseek function

The interpretation of the offset depends on the value of whence
  • If whence is SEEK_SET, the file's offset is set to offset bytes from the beginning of the file
  • If whence is SEEK_CUR, the file's offset is set to its current value plus the offset. The offset can be positive or negative.
  • If whence is SEEK_END, the file's offset is set to the size of the file plus the offset. The offset can be positive or negative. 
Q: How to get current offset of a file.

From the interpretation, you know that when whence is SEEK_CUR : new offset = current offset + offset 


off_t currpos = lseek(fd, 0, SEEK_CUR);

NOTE: Not all files all capable of seeking. If the file descriptor refers to a pipe , FIFO, or socket, lseek set errno to ESPIPE and returns -1. Also remember: Because negative offsets are possible, we should be careful to compare the return value from lseek as being equal to or not equal to -1 and not test if it's less than 0.

Example1:Test if a file could be seeked
#include <unistd.h>
#include <stdio.h>
int main(void)
{
 if (lseek(STDIN_FILENO, 0, SEEK_CUR) == -1) {    
  printf("cannot seek\n");
 }else {
  printf("seek ok\n");
 }

 return 0;
}
The result can be showed as below:
Figure3: The result of Example1

5.Create a file with a hole in it using C and JAVA
    In the next two examples, we use C and JAVA create a file with a hole.
C program:
#include "apue.h"
#include <fcntl.h>

int main(void)
{
 char buf1[] = "abcdefghij";
 char buf2[] = "ABCDEFGHIJ";

 int fd;
 if((fd = creat("file.hole", FILE_MODE)) == -1) {
  err_sys("creat error");
 }

 if(write(fd, buf1, 10) != 10) {
  err_sys("buf1 write error!"); 
 }
 //offset now is 10

 if(lseek(fd, 16384, SEEK_SET) == -1) {
  err_sys("lseek error");
 }
 //offset now is 16384

 if(write(fd, buf2, 10) != 10) {
  err_sys("buf2 write error!");
 }

 exit(0);
}

Java Program:
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.RandomAccessFile;

public class RandomRead {
   final static String FILENAME = "file.hole";

   protected RandomAccessFile seeker;   //RandomAccessFile is the Class we use in JAVA to set and get offset
   static RandomRead r;

   public static void main(String[] argv) throws IOException {
  String lowercaseLetters = "abcdefghijk";
  String uppercaseLetters = "ABCDEFGHIJK";
     r = new RandomRead(FILENAME);
  System.out.println("Before writing!");
  r.printMessage();
     r.seeker.writeBytes(lowercaseLetters.substring(0,10));   //offset now is 10
  System.out.println("After writing 10 bytes");
     r.printMessage();                   //print the message
  System.out.println("Set current offset to 16384");
     r.seeker.seek(16384);               //offset now is 16384 
  r.printMessage();
     r.seeker.writeBytes(uppercaseLetters.substring(0,10));   //offset now is 16394
  System.out.println("After writing another 10 bytes");
     r.printMessage();
   }

   /** Constructor: construct RandomAccessFile */
   public RandomRead(String fname) throws IOException {
     seeker = new RandomAccessFile(fname, "rw");
   }

   /** Read the Offset field*/
   public long readOffset() throws IOException {
     return seeker.getFilePointer();          //get the current offset
   }

   /** print the file size and the current offset */ 
   private  void printMessage() throws IOException {
      System.out.println("file size:" + r.seeker.length());  //file length
      System.out.println("Offset is " + r.readOffset());   //file offset
   }
}
The Result:
Figure 4: JAVA Program Result

Both Programs would create a file (file.hole) with a hole in your working directory.  Do recall that any bytes in a file that have not been written are read back as 0.  We use the od(1) command to look at the contents of the file( I show the file created by JAVA Program, and 0040000 is the octal format of 16384)
Figure 5: file.hole content

A hole in a file isn't required to have storage backing it on disk. Depending on the file system implementation, when you write after seeking past the end of the file, new disk blocks might be allocated to store the data, but there is no need to allocate disk blocks for the data between the old end of file and the location where you start writing. To show what I mean, I also create a file(file.nohole) with size 16394, Figure 6 compares the two files. They have the same file size 16394, but different disk blocks, file.hole takes 8 disk blocks while file.nohole takes 20 disk blocks.
Figure 6: file.hole and file.nohole



No comments:

Post a Comment