Wednesday, September 29, 2010

how to resolve issues with local JFS filesystems that have run out of available space.

Resolving full filesystems
To resolve an out-of-filesystem-space issue, complete these steps:

Determine what filesystems are full.
Determine where space is allocated within the source filesystem.
Take the required steps to resolve the out-of-space condition.
Once the above steps have been completed, the situation should be resolved,
or the reason for the problem should be understood.

If these steps do not resolve the issue, filesystem corruption MAY be involved.
Unmount the filesystem and run a full fsck against it to verify that no
corruption problems exist.


--------------------------------------------------------------------------------

Determining what filesystems are full
The df command is used to get filesystem status information. The relevant
field to consider is %Used.

%Used = percentage of total filesystem space currently allocated

Example:

Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 12288 68 99% 1823 23% /
/dev/hd2 409600 20436 96% 16181 16% /usr
/dev/hd9var 8192 6088 26% 163 8% /var
/dev/hd3 12288 11340 8% 87 3% /tmp
/dev/hd1 57344 13872 76% 1459 11% /home

In this example, most of the available free space in the root filesystem /
is allocated.

At this point, we have determined free-space problems on the / filesystem.
The next issue is to determine what kind of space problem exists.

Keep in mind that the mounts are hierarchical and have a bottom-up precedent.
In other words, a filesystem mounted below a second filesystem cannot access
data in the second filesystem. For example, if you have a mount entry called
/myfilesystem/mydata immediately followed by a mount entry called /myfilesystem,
then the /myfilesystem mount point cannot access the /myfilesystem/mydata
filesystem and any data that resides there.


--------------------------------------------------------------------------------

Determining where space is allocated within each filesystem
There are two commands generally used to determine how and where filesystem
allocation is placed: df and du .

df uses the space in a filesystem that is currently unallocated to determine
the space that is used in a filesystem. For instance, if you have a filesystem
that consists of 8192 512-byte blocks, and 4096 of those blocks are currently
not allocated to anything, then the total space being used by the filesystem
would be 4096 512-byte blocks.

Allocated Storage = Total Storage - Unallocated Storage

df is inherently the most reliable command to report filesystem usage, because
df reports information based on the filesystem as a whole.

du is a file-oriented command. It reports the space allocated to a specified
file or directory. du must have a destination parameter, and is not isolated
to a filesystem. For instance, running du / would give allocation information
for all files in / . This would include all files in the / filesystem and any
other filesystem mounted under / , such as /tmp, /var, and /usr . You could use
the -x option of du to keep the operations within the filesystem, but there
are cases where the results of using this option may be incomplete.

du will only report space taken by files. It will not report space taken by
filesystem metadata, such as inodes, inode maps, or disk maps. inode/disk maps
and other reserved areas for filesystem use will take up a negligable portion of
the filesystem space, but the areas reserved for inodes can be substantial, and
is ultimately based on the NBPI (Number of Bytes Per Inode) chosen when the
filesystem was created. Each inode uses 128 bytes of filesystem space, so the
amount of space taken for inode use will be the percentage defined below:

(128 / NBPI) * 100

By default, a filesystem will use a NBPI of 4096, so the general overhead for
a filesystem will be about 3%.

To determine what the NBPI is for the filesystem in question, issue the lsfs
command with the -q option on the mount point of the filesystem.

Example:
# lsfs -q /
Name Nodename Mount Pt VFS Size Options Auto Accounting
/dev/hd4 -- / jfs 81920 -- yes no
(lv size: 24576, fs size: 24576, frag size: 4096, nbpi: 4096, compress: no, bf
: false, ag: 8)

du will only show allocated information about files it can reference. There are
two cases where du may not show information about allocated storage.

The file is hidden because a filesystem or file has been mounted on top of this
entry. If you had a file that was stored in /bobby, and then mounted a
filesystem on top of /bobby, then du would no longer see what was in the
directory /bobby. It would only see the information in the filesystem that was
mounted over /bobby.

The file is open by other applications, and the file has been removed. In this
case, the storage for that file will remain allocated until all references to
that file have been closed. Without a filesystem entry, du will not show
allocated space for that file, though df will show this space taken from the
filesystem as a whole.

--------------------------------------------------------------------------------

Determining files that are using space in a filesystem
To address the situation presented in case 1 in the previous section, mount
the primary mount point of the desired filesystem on a secondary mount point.
This has the effect of negating any filesystem mounted under the primary
mount point.

Example:

mount / /mnt

In the above example, we mounted the / filesystem over /mnt . The effect is
that if we go into /mnt , we see all the information about the / filesystem
and no other filesystem mounted under / . If we run cd /mnt/tmp , then we are
actually in the directory /tmp in the / filesystem, and not in the /tmp
filesystem. Also, if we run du -sk /mnt , it should closely match the %Used
for / from the df command. If it does not, then this indicates that case 2
may be occuring (see the next section ). For now, we will proceed with case 1.

We can now investigate disk usage accurately. First, go into the root directory
of this filesystem.

cd /mnt

Run the du command to get an accurate accounting of the space that can be seen
for all accessible files in this filesystem.

# du -sk /mnt
11778 /mnt

This will report the accounted space taken for files in kilobytes. If you add
the overhead of the filesystem described above, this figure should closely
match that given by the df command.

Example:
# df -vk /
Filesystem 1024-blocks Used Free %Used Iused Ifree %Iused Mounted on
/dev/hd4 12288 12220 68 99% 1823 1249 23% /

The overhead in this case will be (128/4096) * 12288K = 368K

The total space that can be accounted for is 11778K + 368K = 12146K versus the
reported space taken from df of 12220K. This outcome is accurate and indicates
that any space seen is accounted for in a file somewhere in the filesystem. If
the difference between the two is large, then this indicates case 2 is more
likely to be occurring. The next section , "Resolving Space Taken by Open
Files That Have Been Deleted" addresses the situation in case 2. If not,
continue with the following steps.

Run the following command on the new mount point of this filesystem to get
a sorted disk usage of the filesystem's root directory.

ls -A . | while read name; do du -sk $name; done | sort -nr

Example:

# ls -A . | while read name; do du -sk $name; done | sort -nr
2168 etc
192 lpp
168 sbin
40 dev
28 export
12 smit.log
4 var
4 usr
4 tmp
4 tftpboot
4 src
4 smit.script
4 mnt
4 .sh_history
4 .profile
0 unix
0 u
0 lib
0 bootrec
0 bin

This command sorts disk usage for all files in the current directory by size,
in decreasing order. If the file we suspect happens to be a directory, we can
then change into that directory, and re-run the preceding command to determine
what is taking up space within that directory. Continue these steps until you
find the desired file or files, at which point you can take appropriate actions.


--------------------------------------------------------------------------------

Resolving space taken by open files that have been deleted
In case 2, there are files within the filesystems that are opened by
applications but have been removed from the filesystem tree. This behavior
is documented in the unlink() system call as follows.

When all links to a file are removed and no process has the file open, all
resources associated with the file are reclaimed, and the file is no longer
accessible. If one or more processes have the file open when the last link
is removed, the directory entry disappears.

However, the removal of the file contents is postponed until all references
to the file are closed.

You can use the fuser command with the -dV flag on the full path to the
device on which the filesystem resides. This will display files that have
been removed but are still open. It will also report the inode number and
size of such files. Using the process ID returned for these files, you can
instruct the source application to close these files, or you can exit the
application. Once this has occurred, and fuser no longer shows this deleted
file, the space will be returned to the filesystem for general use.

NOTE: Using the flags given for fuser requires enhancements for fuser to
be installed for the appropriate release of AIX, as listed in the following
table. You may need other fixes in addition to these to reliably perform the
operations of this document. Please refer to the list at the end of this document.

APAR Description AIX Level
---- ----------- ---------
IX78943 ENHANCEMENTS TO FUSER 4.1
IX78941 ENHANCEMENTS TO FUSER 4.2
IX78523 ENHANCEMENTS TO FUSER 4.3

If the filesystem had a shared library that was deleted and the process that
used the library is no longer active, the library will still be open on the
loader list. fuser will not detect these situations, but they can be remedied
by running the slibclean command. This will flush any shared libraries from
the loader list that are no longer active, and if they were deleted, the
space will then be reclaimed.


--------------------------------------------------------------------------------

Recommended fixes
APAR Description AIX Level
---- ----------- ---------
IX78066 FSCK SHOULD PATCH UP ALLOCATIONS IN WMAP 4.3
IX78873 MALLOC FAILED ERRORS FROM FUSER 4.3
IX76061 DEFRAGFS AND FSCK INCORRECTLY REPORT BAD BIT MAP 4.2
IX77541 FSCK SHOULD PATCH UP ALLOCATIONS IN WMAP 4.2
IX86678 FUSER NOT FINDING PIDS OF MPX BASE DEVICES 4.3
IY04972 Reduce serverity of disk inode corruption 4.3
IY09173 FSCK DOES NOT CORRECT FILE CORRUPTION WHICH 4.3
IT FINDS.

No comments:

Post a Comment