Your Search Ends Here: How to fix the most common Linux problems

We'll come right out and say this - Linux breaks. There, we've got that off our chests. No matter how much we might like our chosen distro, there is no denying that things can go wrong, or that it might not even be right in the first place.

Of course, Linux distros are not alone in this - a computer system is a huge, complex collection of interacting software and hardware, even more so when the basic install includes several gibibytes of extra software over and above the OS.

We can't show you solutions for every problem that might arise, but we can show some of the common issues people face and, more importantly, show you how to go about identifying a problem. One more thing to bear in mind as you're reading is that even if you can't work out the solution yourself, an accurate description of the problem will be of great help when asking others for advice.

The typical distro has more components than a car engine, yet is open for, and even encourages, user fiddling, which leads the curious user to indulge in some provocative maintenance. To make it worse, a computer is often built from bits made by different manufacturers - motherboard from one, graphics card from another, soundcard from elsewhere - and an operating system that many hardware manufacturers pay no more than lip service to, if that.

So here's our guide to dealing with some of the most common problems, and some advice on how to deal with new disasters. The types of difficulties most often seen can be split into a number of broad categories: booting, hardware and drivers, misbehaving software and networking are among the most popular topics for discussion.

Distro fixes

Distro installers are pretty good at identifying an existing Windows installation and setting up dual booting, but should you have to reinstall a spyware-riddled Windows install you'll find that your machine boots straight into Windows and that your Linux installation is gone!
Don't panic: all Windows has done is overwrite the Grub bootloader with its own equivalent, removing your boot menu. All your data is still there - you just need to reload the bootloader configuration into the disk's master boot record (MBR). You'll need to boot from a Live CD to do this, this, then open a terminal and run

Code:

sudo grub-install /dev/sda

This assumes you have everything installed on the first (or only) hard drive. Grub-install will usually make a good job of detecting a Grub installation and set things back to rights. If it doesn't, you'll have to do it manually, which is a lot easier than it sounds. Run sudo grub to enter the Grub shell. then run

Code:

find /boot/grub/stage1

...to determine which partition holds the Grub files. If Windows is on the first partition Grub is likely to be on the second, in which case this command will return something like (hd0,1). Now set Grub up with

Code:

root (hd0,1)
setup (hd0)
quit

The first command identifies the boot partition, the second writes the bootloader to the MBR and then you leave the Grub shell. Grub is only concerned with the location of /boot, so if you have a separate /boot partition, omit the /boot part from the find command.

Live CDs

If an errant Windows reinstall has zapped your Grub boot sector and you can't load Linux, you might want to try using a Live CD distro. These run directly from a CD (or DVD) and don't need to install anything on your computer to make it work. One of the pioneering Live CDs, and still one of the best, is Knoppix, which we just happen to have on our LXFDVD. Knoppix, especially the DVD version, is a full Debian-based distro that happens to run from a CD/DVD, so anything mentioned here can be done with it.

For a more compact alternative, you could try System Rescue CD, also on this month's DVD. There are no prizes for guessing what this is designed for, but it has the advantage if being compact (less than 250MB) and it has an installer to copy it to a USB pen drive. It comes with a lightweight graphical desktop and plenty of tools for fixing up your computer. Live CDs usually have excellent hardware detection and configuration. If you have a problem with a piece of hardware, boot one of these discs and see how they configure it.

Knoppix is indispensable for system rescues.

When booting stalls

In times of yore, the Linux boot sequence scrolled pages of text up the screen. Most of it was undecipherable to mere mortals, but if it stopped you could see exactly where it stopped, with the last line or two of text containing a clue to the problem.
Nowadays, distros show a splash screen while they're booting, which is all very nice until things go wrong, then the boot stops and the splash screen hides all the clues.
If the failure is early in the boot sequence, you may find that adding noapic to the kernel boot line helps. Do this in the same way you remove the splash references (see box below). If this does fix it, edit the Grub configuration file at /boot/grub/menu.lst or /boot/grub/grub.conf and add the noapic option, or others your searches revealed as cures.
You can use the same technique if you system is slow in shutting down, watching the output to see where it stalls or pauses for too long. As with so many problems that can arise, it's easier to find an answer once you know the problem.

Step by step: identify boot errors

Remove the splash screen: To disable the splash screen and show the boot messages, highlight the first item on the menu and press E (for edit), move the highlight on to the line starting with 'kernel' and press E again to edit the kernel line. Remove any references to quiet or splash, press Enter then B (for boot).

It's different for SUSE...: SUSE works differently in that the splash settings are built in. Boot options are typed directly at the menu screen - add splash=0 to disable the splash screen entirely. Press F1 to bring up a list of options and use Tab and Enter to get further information on any of them.

Find the problem: Now that you're able to see the messages, you can see where the boot process stops. Google for the line containing the error (or the last line) to see what you can do. It is possible that some piece of hardware is causing the problem, so unplug all unnecessary devices and try again.

Hardware fixes

Don't expect to find Linux drivers on the CD that comes with your shiny new gizmo. That's not because the manufacturers don't care about Linux but because drivers for most devices are already installed on your system as kernel modules. Kernel modules can be loaded from the command line or a startup file, but the HAL/D-BUS system usually recognises hardware and loads the modules automatically. What do you do if it does not? How do you know which module to load?

Identifying the hardware

The first step is to get the details of the hardware with lspci for internal devices or lsusb for USB devices (some laptop hardware is also connected via USB) with these commands

Code:

sudo lspci
sudo lsusb

...which produce output like this:
00:1f.2 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA AHCI Controller (rev 02) 01:00.0 VGA compatible controller: nVidia Corporation GeForce 7100 GS (rev a1) 02:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03) 03:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0)
...or this:

Code:

Bus 001 Device 004: ID 03f0:2c17 Hewlett-Packard
Bus 004 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
Bus 002 Device 002: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port

Once you've identified which device is which, you can get further information by using the -s option to query a specific device and -v for more information, like:

Code:

sudo lspci -s 03:00.0 -v
sudo lsusb -s 001:004 -v

This is particularly useful with lspci, as the extra information shows the kernel module in use for the device (if there is one). The -k option also shows this, without the other extra information. You may be wondering what use that could be if you're trying to find out which module to load to enable the device.

If you find lspci a little cryptic, there are graphical alternatives. We particularly like Hardinfo.

The answer lies with that firm favourite of troubleshooters, the Live CD. If the device is recognised when booting from the Live CD, run lspci -k to see which module it uses, then you can go back to your installed system and try to load it with.

Code:

modprobe -v modulename

If you see no output, the module is already loaded, and should show up in the output from lsmod. If you see something like this:

Code:

insmod /lib/modules/.../modulename.ko

...it means the module is now loaded and your drivers should work, or at least be available for configuration. The other responses are a 'device not present' message, which indicates that the hardware for that module is not found, which usually means you have picked the wrong module.
Finally a 'module not found' message means the module is not present on your system. Most distros come with most kernel modules installed, so your hardware is either incredibly arcane and you'll need to compile a new kernel to enable it, or the hardware is only supported in a more recent kernel than you have.

Auto-loading modules

There are often times you need to have a module loaded when you boot. All distros have a method for this, but they generally differ.
In Ubuntu you simply add the module name to the bottom of the file /etc/modules. SUSE users have to edit /etc/sysconfig/kernel and change the setting for MODULES_LOADED_ON_BOOT to something like

Code:

MODULES_LOADED_ON_
OOT="module1 module2"

In Fedora you add a file (a script really), to /etc/sysconfig/modules/ with a .modules extension. For example, to load NdisWrapper you would create /etc/sysconfig/modules/ndiswrapper.modules containing

Code:

#!/bin/sh
/sbin/modprobe ndiswrapper

As it is a script, you also need to make it executable with

Code:

sudo chmod +x /etc/sysconfig/modules/ndiswrapper.modules

You can check the kernel version of the Live CD and your system with this:

Code:

uname -r

If the Live CD's kernel is newer, look for an update for your distro. Another option is that this hardware is not supported in the kernel but uses a third-party driver. The most common occurrence of this is with wireless cards that use a driver like MadWifi or NdisWrapper. If you need to install a separate driver, it is probably available in one of your distro's repositories. Once that is installed, your hardware should be ready to go.

Graphics hardware

None of the above applies to graphics cards. Their drivers are part of the X.org software, unless you use an ATI or Nvidia card. These do have drivers included with X.org, but the separate driver packages from the manufacturers give better performance. If you want to do anything that needs 3D acceleration, whether it's playing games or enabling desktop effects, you should try the manufacturer's drivers.
While they can be downloaded and installed from the respective websites, it is best to use your package manager to install them, because they also require changes to your xorg.conf file, the file that controls your graphical display, and the distro packages will make the changes for you. If you do decide to go the independent route, make sure you download the correct package for your card and read the instructions carefully before you do anything.

It's not all down to the software

The hardest problems to diagnose are those that occur apparently randomly, especially if they lock up or crash the computer without warning. When the crash occurs at the same time, or using the same software, you have an idea who to blame, but if it is truly random it may well be hardware. The most common hardware causes of such problems are overheating, faulty memory or poor power.
It's no use thinking "this doesn't happen in my other OS, so it must be Linux's fault" because different systems work the hardware differently. For instance, Linux uses memory more aggressively and will experience instability due to faulty memory before Windows starts to show symptoms. Fans and heatsinks become gradually blocked with dust and other crud during a computer's lifetime. Try blowing it clear with a can of compressed air. Installing lm_sensors (your distro should have it) will let you monitor CPU and case temperatures, and a system monitor like GKrellM will display the temperatures on your desktop.
Laptops don't lend themselves to being opened up for a good blow, but you should check the various vents for any blockage. One area where laptops are fairly safe is power, since the battery ensures a clean steady supply. Desktop power supplies are another matter, especially the cheap, unnamed ones that are included with lower-priced cases.
Built down to a price, some barely meet their specs when new, so try a different PSU in your computer – you may be surprised by the difference it makes. Dirty power can damage your hardware and data, so saving money here can be a false economy whereas good-quality PSUs can go on for years. If you live in an area with unreliable or dirty power, a UPS (Uninteruptible Power Supply) may be a worthwhile investment. Surge protectors don't protect against power reductions, only surges.
Testing memory is easy, if time consuming. Most Live CDs include Memtest86, which does exactly what it says. You need to boot into Memtest86, because it can only test memory that isn't in use, so you don't want a full OS running. Let it run through its full set of tests at least twice, preferably overnight. The longer you can leave it running, the more certain you can be that your memory is OK. If you see any errors, at least one of your memory sticks will need to be replaced.

Where's my desktop?

So you've installed the latest distro, rebooted your computer and instead of the glorious 3D enhanced desktop you expected to see, all you get is a black screen with a login prompt and a blinking cursor. What went wrong? The usual cause of this is that the installer was unable to auto-detect the properties of either your graphics card or display.

Using the shell

Much of the advice we give here is in the form of terminal commands. Most distros have their own configuration programs, which vary considerably, while the underlying commands they call remain constant across all distros. By cutting out the middle man and running those commands directly, the solutions we give are portable across all distros.
Some commands need to be run as as the root user, which is done either by prefixing the command with sudo (when you will be prompted to give your password) or by running su in the terminal first, which will ask you for the root password. We use the sudo method throughout this feature, as it is the only option with some distros. If you have full root access through su, simply run the command without the sudo.

Sometimes it will drop right down to a text console, other systems may boot into a limited display, like 800x600 with no acceleration. You need to run your distro's configuration tool to generate a working display configuration, but the first step is to log in as root if possible, otherwise as your normal user, using the password you gave during installation. The program to run depends on your distro, but the most popular options are:

openSUSE - yast2
Debian - dpkg-reconfigure xserver-xorg
Ubuntu - sudo dpkg-reconfigure xserver-xorg
Mandriva - XFdrake

These usually open a textual version of the graphical configuration tool, from where you can select the correct graphics card and monitor. If your distro doesn't have such a tool, you can create a basic X.org with

Code:

X -configure

If you still get a text display when you boot up, log in at the console and run

Code:

startx

...which should load up a really basic desktop. Press Ctrl+Alt+Backspace to exit it, you now have a working X display. If startx fails, look at the log file at

Code:

/var/log/ Xorg.0.log

...in particular any lines containing (EE), as these are errors. The file is quite long, but you can find them with

Code:

grep EE /var/log/Xorg.0.log

If you get a desktop, but in a limited resolution, the approach is the same, except you can use the graphical versions of the configuration tools.

This is not as pretty as the usual face of YaST, but most distros' configuration tools have a text version for when the graphics fail to be graphical.

Network fixes

If there is one topic that causes more tearing out of hair than any other, it's wireless networking, what with in-kernel and third-party modules, not to mention the use of Windows drivers as a last resort. Then you have the various encryption methods and a variety of network management systems to contend with.
As with all such things, when you break it down into simple steps, one complex task becomes a series of much simpler ones. The first step is to make sure your hardware drivers are loaded, so check the output of

Code:

sudo ifconfig -a

This should show your wired network interface as eth0 and your wireless as one of wlan0, ath0 or even eth1. If none of these show up, try repeating the test from a Live CD and, if it does show up, run

Code:

sudo lspci -k

...to see which module it uses. If you're still stuck, the details from lspci -v should give enough information on the card to search the web for the correct driver.
Once you have the correct driver you can get on with configuring it, right? Well, maybe. Some wireless cards need a firmware file that is loaded on to the card when it is initialised. The driver will take care of this, but it needs the file to be in /lib/firmware. The methods for getting this file depend on the hardware in use, but usually involve extracting the firmware from the Windows drivers (or downloading a file that someone has already extracted).

So now you are ready to proceed with configuring the connection, so you can skip over the next bit. A last resort? What happens if you can't find the driver for your wireless card? In that case you will have to use NdisWrapper. This is a kernel module that uses the NDIS (Network Driver Interface Specification) drivers supplied for Windows in Linux.
The first step is to install NdisWrapper from your distro's package manager. Then you need files from the CD that came with your card. It is important to use the correct CD, because manufacturers have a habit of changing the chipsets used on a card, and hence the drivers needed, without changing the model number. You can also find information on which cards are supported by which drivers at http://burnthesorbonne. com/?page_id=32.
Once you have installed NdisWrapper, find the driver file, which will be an INF file on the CD. Load it with

Code:

sudo ndiswrapper -i /path/to/driver.inf

You can then check that it is working with

Code:

sudo ndiswrapper -l

...which will list the drivers now available to NdisWrapper. Finally, you can load the module with

Code:

sudo modprobe -v ndiswrapper

...and your wireless card should appear as wlan0. If there is no INF file on the CD, the drivers are probably packed into an EXE file, which is usually a self-extracting zip file in a Windows executable. You can unpack this using the unzip program on Linux with something like

Code:

unzip /mnt/cdrom/install.exe

You will probably want the NdisWrapper module loaded automatically – see the Auto Loading Modules boxout, belowleft, for details on this. Getting connected The first rule of wireless networking is to always use an encrypted connection, but in this case it is easier if you turn off encryption for a couple of minutes, as it removes one potential source of problems.
Also make sure that your router is not set to filter out all but specified MAC addresses (we've all been caught out by that one when using a new laptop or wireless card).

Most Linux distros use Network Manager to handle wireless (and wired) connections, and the name of your wireless access point should appear when you click on the Network Manager icon in the task bar. If it doesn't, the first thing to check is that your access point is set to broadcast its SSID (Service Set Identifier – the name of your wireless network).
Some people disable this in their access points in the belief that it increases security (it doesn't, because every time you connect to the network, you broadcast the SSID in plain text).

If it still fails to show up, try moving closer to the access point. You can also check for the presence of available networks with these terminal commands

Code:

sudo ifconfig wlan0 up
sudo iwlist wlan0 scan

The first line ensures that the wireless card is active, and the second should produce a list of all wireless networks in range. If you see a message like "Interface Doesn't Support Scanning" you're either using the wrong interface (wired instead of wireless), or you're not using the correct driver or firmware for your wireless card, and you'll have to go back to the top of the page and try again.

Once you can connect, immediately disconnect, enable encryption in your access point/router and try again. The best encryption to use is WPA2 or, if your wireless card/ driver does not support it, use WPA (Wi-Fi Protected Access). You should not use WEP unless you absolutely cannot avoid it. It provides only minimal security and is easily cracked by a determined neighbour.

Wired networking

Networking that doesn't work is a problem that affects all operating systems from time to time. It can be frustrating to deal with, since things often seem to just not work, without giving any clue as to where the chain is broken. The first test is usually that old favourite, ping:

Code:

ping www.linuxformat.co.uk

This should show packets being sent to an received by the Linux Format server. If it doesn't, try

Code:

ping 80.244.178.151

That's the IP address of the Linux Format website, so if that works when the previous command didn't, you know that you're unable to resolve domain names into IP addresses. Check that /etc/resolv.conf contains the addresses of your ISP's DNS servers. If you are using a router with a DHCP server, you may find that it contains the router address, in which case you should check that the router has the correct IP addresses.

If pinging an IP address doesn't work, try pinging one of the ISP's servers, such as the DNS server (ISPs usually give DNS addresses on their websites, and it doesn't hurt to make a note of these). If that works, the problem is probably with your ISP's connection to the rest of the internet. Another possibility is that your system is trying to use IPv6, the newer IP protocol, but your router does not understand it, which causes long delays, long enough to look like it isn't working.

The next step is to check whether you can connect to your router's web interface (if it has one) or ping your modem. If this works, the link between your modem and the ISP is down, which could be a line fault (check that your cat/ significant other hasn't unplugged the ADSL phone line), a problem at the ISP or you haven't paid your bill.

At this point a phone call to your ISP's support desk is in order. If you can't get through, it's most likely a problem at their end and the only solution is patience. Finally, check everything local: are the cables connected? Does ifconfig -a even show your network interface? If not, have you changed anything since your last reboot? A kernel update will stop third-party modules working until they are reinstalled, and some network adaptors, particularly wireless ones, use third-party kernel modules.

Sometimes you find a particular site doesn't work and need to know whether the site is down for everyone or it's just a connection problem from your ISP. This site that answers this question is http://downfoeveryoneorjustme.com

Step by step: IPv6 troubleshooting

Update your router: The cleanest way to fix a problem here is to update your router to handle IPv6. Check the manufacturer's website for a firmware upgrade and follow the instructions for updating your router. It usually involves downloading a file and then uploading it to the router's web interface.

Test with Firefox: You can disable IPv6 in Firefox - type about: config in the URL box, then type IPv6 in the Filter field. This will narrow it down to one entry, network.dns.disableIPv6. Right-click on this and select Toggle, which will change the Value field from false to true. Try to access a website, and if it works, IPv6 was your problem.

Disable IPv6: You can disable IPv6 system-wide by editing your module configuration file. This is usually one of /etc/modprobe.conf or /etc/modprobe.d/aliases, depending on your distro. Remove any references to IPv6 and add these two lines.

Code:

alias net-pf-10 off
alias ipv6 off

Software fixes

Have you ever noticed how sometimes things just go slower and slower? There's nothing specific but everything seems to take longer than it should. I find caffeine helps here, or sleep in extreme cases, but what about when it happens to the computer? There are three main resources in your computer: CPU cycles, memory and hard disk space, and it is possible that a runaway program, or even general usage, can be using up too much of one of these.

CPU usage is the easiest to check, using the top program (that's its name, not an opinion of its usefulness). Run this from a terminal and you'll see rows and columns of data in the terminal window. The CPU line shows how much of the processor is being used by various types of program: sy is system, us is user and ni refers to programs that are running with a positive nice value. Nice is a way of scheduling programs to use more or less CPU; the higher the niceness, the nicer a program is to other process, letting them have first pick at the available CPU cycles.
It's a little more complicated than that, as nice is only a recommendation to the kernel's process scheduler, but that's too involved to go into here.
Double top

If you have more than one CPU core, press 1 to have top show them all. The figures to look at first are id and wa, for idle and wait. Unless you are compiling software or playing video, the idle figure should be quite high, usually over 90%. If it is down to single figures, or even zero, something is sucking up all your CPU cycles. That's fine if it's something you intend to do, like transcoding a video, but it could also be a runaway process.

Filelight shows where all your disk space is going, or you could use "du" from a terminal.

The list of processes shows the amount of CPU and memory that each program is using, and by default this is sorted by CPU usage. If something is hogging the processor, you can use top to either renice it, if it is something you want to keep running, or kill it.

The first column shows the PID, which is the program's unique process ID. Press R to renice or K to kill, and type in the PID of the process. Renicing asks for a number to nice the process by, which is added to the existing value (higher number are 'nicer'). Nice values can run from -20 to 19, but only the root user can set a negative value.

Five is a good starting point, and 19 means the process only gets CPU time when nothing else wants it, which is useful for an intensive background task like video transcoding. Killing a task sends signal 15 (the TERM signal), and is the equivalent to pressing Ctrl+C in a terminal. It asks the program to stop, so the program has the opportunity to shut down cleanly. If the program is really out of control, it may not respond to this, so you should send signal 9 (KILL) which will stop the program without giving it that chance to elegantly shut down.

Step by step: dealing with bugs

Check it's new: If you think you have found a bug, first check with your distro's package manager that you have the latest version. Then check the program's website to see if there exists a later version of the software that fixes it. If so, bug your distro's developers to fix it.

Do your homework: Get as much information as you can. Consult the man page, or run the program in a terminal with the --help option, to see if there is a --verbose or --debug option to increase the output. Record this to a file by running the program with

Code:

progname --verbose > program.log

Report the bug: Report a bug using your distro's bug tracker. Many projects have their own bug tracker, but if you are using your distro's packages, report it to them first. Most open source developers welcome bug reports, especially if they contain enough information to find and fix the problem.

Your Search Ends Here

Friday, August 21, 2009

How to fix the most common Linux problems

No comments:

Post a Comment