The Linux Filesystem

Explanation: Linux

Background

Your hard drive is divided into sections called partitions. These are not physical divisions, but defined by software. At the beginning of the hard drive is a special area called the partition table, which lists the beginning and end of every partition on the hard drive. In Windows, different partitions are commonly given different drive letters, for example C: for the system partition and D: for the user and documents partition. Despite how Windows presents them, they are both commonly on the same physical hard drive. In Linux the physical hard drive is commonly identified as /dev/sda and partitions on that hard drive as /dev/sda1, /dev/sda2, etc.

Most partitions contain a filesystem to make file and directory management robust and user-friendly. At the beginning of the partition is usually some info necessary to navigate the rest of the filesystem. Common filesystems include FAT32, NTFS (the default in Windows), and EXT4 (the default in Linux). Note that Windows cannot easily read EXT4 filesystems, so if you want a dualboot setup, your Windows partition must be NTFS, your Linux partition must be EXT4, and any partition you want to access from both operating systems should be NTFS. Really the only partition that does not have a filesystem is a Linux swap partition, which is basically fake RAM located on your hard drive (Windows has an equivalent called the Pagefile).

Directory Structure

The Linux filesystem starts at /, which is called the "root" of the filesystem. Every file, directory, and external storage device is somewhere after /. That is to say, you can get to anywhere on the filesystem by starting at / and going through sub-directories. Once you get used to this concept, I think it makes more sense than Windows and its drive letters. Any path that begins with / is called an absolute path because it does not matter what directory you are currently in. Any path that does not start with / is called a relative path because it is interpreted relative to your current directory.

There are two special entries that exist in every directory. The entry . points to the directory it is in, and the entry .. points to the parent directory of the directory it is in. You will see these two entries in every directory no matter what filesystem you are on. They will show up with ls -a since they count as "hidden". Additionally, the shell creates one more shortcut: ~ is equivalent to /home/currentuser/. Finally, you should note that any path ending in the path delimiter (i.e. /) is specifically a directory rather than a file. The above shortcuts are very convenient when working in the terminal. To help you understand them, note that all of the following paths refer to the same file:

/home/username/.ssh/config
~/.ssh/config
~/.ssh/../.ssh/config
~/.ssh/./config
./config (if your pwd is ~/.ssh/)

There are quite a few directories in / by default, and it's nice to know what their general purposes are. Although the organization is just convention and not enforced, following it makes everything easier.

/bin/ — programs needed before /usr/ is mounted
/boot/ — boot loader files
/dev/ — byte-level interface to physical devices
/etc/ — mostly config files
/home/ — user directories
/lib/ — shared libraries and kernel modules
/media/ — mount point for removable media (managed by distro)
/mnt/ — like /media/ but user-managed
/opt/ — manually installed software (not via package manager)
/proc/ — provides info about kernel and system
/root/ — home directory for the root user (it's not in /home/root/)
/run/ — files describing the state of running processes (also in /var/run/)
/sbin/ — same as /bin/ but these need root priveleges
/srv/ — files made available to remote clients through services
/tmp/ — temporary files (erased on reboot)
/usr/ — multi-user programs
/usr/bin/ — general system-wide programs
/usr/sbin/ — same as /usr/bin/ but these need root priveleges
/usr/local/bin/ — user-created system-wide programs
/usr/local/sbin/ — same as /usr/local/sbin/ but these need root priveleges
/var/ — temporary or state files (not erased on reboot)

From anyone used to Windows, I'd like to note that file extensions carry significantly less meaning than you've been led to believe. Most programs in Linux, including the operating system, do not care about file extensions. There are exceptions to this, but in general file extensions are purely intended as a convenient organization tool for the user. Changing an extension does nothing to change the actual data conatined in the file.

Ownership and Permissions

Every file and directory has an owner and related permissions. The owner of a file is denoted by user:group, which means every file (or directory) effectively has two owners. The first is a user, and the second is a group of users. In some Linux distros like Ubuntu, for every user there is a group with the same name whose only member is that user. Thus files and directories can by owned by laptopdude:laptopdude for example. On the other hand, if I wanted to let only certain people access one of my directories, I could create a group, let's call it friends, and change the ownership of my directory to laptopdude:friends. Together with permissions, this allows a Linux user fine-grained file access control. You can modify the ownership of files that you own using the chown command.

There are three permissions in Linux: read, write, and execute, commonly referred to as rwx respectively. Anyone with read permission can open and view the contents of a file. Anyone with write permission can modify and save a file. Anyone with execute permission can run the file as a program. For directories, execute permission allows the user to view the contents of the directory. These permissions can be specified for three different categories of users. First is the user owner of the file, second is the group owner of the file, and third is other users, commonly referred to as ugo respectively. Therefore every file has 9 permissions that can allow or deny users access. They are commonly shown using a series of 9 characters where each group of 3 corresponds (in order) to one of ugo. For example, rwxrwxrwx allows anyone to do anything to that file, while --------- allows no one to do anything. Many files in your home directory are commonly rwxr--r-- which allows you (the owner) to read, write, and execute them, but only allows other users to read them. You can change the permissions of files that you own using the chmod command.

Finally, Linux has "hidden" files. However, unlike in Windows, there is no "Hidden" flag. Instead, any file or directory that starts with . in its name is classified as "hidden" meaning certain programs will not display it by default. You can view hidden files with ls -a. One example is the ~/.ssh directory.

Modifying the Filesystem

In Windows when you plug in a USB drive, it receives a new drive letter and appears in My Computer. In Linux, when you mount a USB drive, you place a special link to it somewhere on the filesystem. For example, in Ubuntu your drive would likely end up mounted at /media/username/drivename. You can then read and write any file or folder on the drive by browsing to the aforementioned mountpoint. In this way even external media (including CDs and network drives) ends up as a sub-path of your root filesystem. You can do so using the mount and umount commands, or by editing /etc/fstab.

There is a special class of files called pseudofiles that look like regular files but behave differently. For example, you will likely find an entry at /dev/sda, which is a byte-level interface to your main hard drive. You could "edit" it in order to write bytes directly to your hard drive (warning: DO NOT EDIT ANYTHING IN /dev). And reading or printing the file would give you the data on the hard drive. Other pseudofiles can do other interesting things. For example, you likely have one at /sys/class/backlight/vendor/brightness that controls your screen brightness. Writing a number to that pseudofile will actually change your screen brightness.

The last cool thing you can do with the Linux filesystem is symbolic linking. This is significantly different from a Windows shortcut. When you create a symbolic link, the link is treated as if it were the target of the link. For example, I could create a symbolic link at ~/link that points to ~/documents/school/senior/paper.txt. Any operation you perform on the path ~/link such as opening it in a text editor will be carried out on the target instead. Since this is silently carried out by the operating system, it can be very handy for scripting and development. You can view symbolic links and their targets with ls -l and create them with ln -s.

Live Media

If your computer has no operating system and you would like to install one, you have to boot your computer from Live Media. This could be a CD, DVD, or USB drive that contains an operating system. You insert the media, start the computer, press a function key to get to the Boot Menu, and then tell the computer to boot from the Live Media. The computer then reads the data on the Live Media, and boots the operating system it contains. This allows you to debug problems on a computer that crashes when trying to boot from an internal hard drive. When running an OS from Live Media, your computer's internal hard drives appear as mountable external storage. When booting from Linux Live Media, the root of your filesystem is located on the Live Media itself. From Live Media you can read/write data from your computer's hard drives, modify partitions, debug fatal problems, or install a new operating system. A Linux Live USB is a great thing to have in case your computer crashes and refuses to boot.

Conclusion

Although it can take some getting used to, I really think the Linux filesystem makes more sense than the Windows implementation. It is much more flexible and easier to work with when coding. Additionally, linux users should be happy to know that EXT4 practically never requires defragmenting. Now you're ready to start learning about and working on the command line.