Of course, Linux distros are not alone in this - a computer system is a huge, complex collection of interacting software and hardware, even more so when the basic install includes several gibibytes of extra software over and above the OS.
We can't show you solutions for every problem that might arise, but we can show some of the common issues people face and, more importantly, show you how to go about identifying a problem. One more thing to bear in mind as you're reading is that even if you can't work out the solution yourself, an accurate description of the problem will be of great help when asking others for advice.
The typical distro has more components than a car engine, yet is open for, and even encourages, user fiddling, which leads the curious user to indulge in some provocative maintenance. To make it worse, a computer is often built from bits made by different manufacturers - motherboard from one, graphics card from another, soundcard from elsewhere - and an operating system that many hardware manufacturers pay no more than lip service to, if that.
So here's our guide to dealing with some of the most common problems, and some advice on how to deal with new disasters. The types of difficulties most often seen can be split into a number of broad categories: booting, hardware and drivers, misbehaving software and networking are among the most popular topics for discussion.
Distro fixes
Distro installers are pretty good at identifying an existing Windows installation and setting up dual booting, but should you have to reinstall a spyware-riddled Windows install you'll find that your machine boots straight into Windows and that your Linux installation is gone!
Don't panic: all Windows has done is overwrite the Grub bootloader with its own equivalent, removing your boot menu. All your data is still there - you just need to reload the bootloader configuration into the disk's master boot record (MBR). You'll need to boot from a Live CD to do this, this, then open a terminal and run
Code:
sudo grub-install /dev/sda
Code:
find /boot/grub/stage1
Code:
root (hd0,1)
setup (hd0)
quit
Live CDs
If an errant Windows reinstall has zapped your Grub boot sector and you can't load Linux, you might want to try using a Live CD distro. These run directly from a CD (or DVD) and don't need to install anything on your computer to make it work. One of the pioneering Live CDs, and still one of the best, is Knoppix, which we just happen to have on our LXFDVD. Knoppix, especially the DVD version, is a full Debian-based distro that happens to run from a CD/DVD, so anything mentioned here can be done with it.
For a more compact alternative, you could try System Rescue CD, also on this month's DVD. There are no prizes for guessing what this is designed for, but it has the advantage if being compact (less than 250MB) and it has an installer to copy it to a USB pen drive. It comes with a lightweight graphical desktop and plenty of tools for fixing up your computer. Live CDs usually have excellent hardware detection and configuration. If you have a problem with a piece of hardware, boot one of these discs and see how they configure it.

When booting stalls
In times of yore, the Linux boot sequence scrolled pages of text up the screen. Most of it was undecipherable to mere mortals, but if it stopped you could see exactly where it stopped, with the last line or two of text containing a clue to the problem.
Nowadays, distros show a splash screen while they're booting, which is all very nice until things go wrong, then the boot stops and the splash screen hides all the clues.
If the failure is early in the boot sequence, you may find that adding noapic to the kernel boot line helps. Do this in the same way you remove the splash references (see box below). If this does fix it, edit the Grub configuration file at /boot/grub/menu.lst or /boot/grub/grub.conf and add the noapic option, or others your searches revealed as cures.
You can use the same technique if you system is slow in shutting down, watching the output to see where it stalls or pauses for too long. As with so many problems that can arise, it's easier to find an answer once you know the problem.
Step by step: identify boot errors

Remove the splash screen: To disable the splash screen and show the boot messages, highlight the first item on the menu and press E (for edit), move the highlight on to the line starting with 'kernel' and press E again to edit the kernel line. Remove any references to quiet or splash, press Enter then B (for boot).


Hardware fixes
Don't expect to find Linux drivers on the CD that comes with your shiny new gizmo. That's not because the manufacturers don't care about Linux but because drivers for most devices are already installed on your system as kernel modules. Kernel modules can be loaded from the command line or a startup file, but the HAL/D-BUS system usually recognises hardware and loads the modules automatically. What do you do if it does not? How do you know which module to load?
Identifying the hardware
The first step is to get the details of the hardware with lspci for internal devices or lsusb for USB devices (some laptop hardware is also connected via USB) with these commands
Code:
sudo lspci
sudo lsusb
00:1f.2 SATA controller: Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA AHCI Controller (rev 02) 01:00.0 VGA compatible controller: nVidia Corporation GeForce 7100 GS (rev a1) 02:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03) 03:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0)
...or this:
Code:
Bus 001 Device 004: ID 03f0:2c17 Hewlett-Packard
Bus 004 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
Bus 002 Device 002: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port
Code:
sudo lspci -s 03:00.0 -v
sudo lsusb -s 001:004 -v

The answer lies with that firm favourite of troubleshooters, the Live CD. If the device is recognised when booting from the Live CD, run lspci -k to see which module it uses, then you can go back to your installed system and try to load it with.
Code:
modprobe -v modulename
Code:
insmod /lib/modules/.../modulename.ko
Finally a 'module not found' message means the module is not present on your system. Most distros come with most kernel modules installed, so your hardware is either incredibly arcane and you'll need to compile a new kernel to enable it, or the hardware is only supported in a more recent kernel than you have.
Auto-loading modules
There are often times you need to have a module loaded when you boot. All distros have a method for this, but they generally differ.
In Ubuntu you simply add the module name to the bottom of the file /etc/modules. SUSE users have to edit /etc/sysconfig/kernel and change the setting for MODULES_LOADED_ON_BOOT to something like
Code:
MODULES_LOADED_ON_
OOT="module1 module2"
Code:
#!/bin/sh
/sbin/modprobe ndiswrapper
Code:
sudo chmod +x /etc/sysconfig/modules/ndiswrapper.modules
Code:
uname -r
Graphics hardware
None of the above applies to graphics cards. Their drivers are part of the X.org software, unless you use an ATI or Nvidia card. These do have drivers included with X.org, but the separate driver packages from the manufacturers give better performance. If you want to do anything that needs 3D acceleration, whether it's playing games or enabling desktop effects, you should try the manufacturer's drivers.
While they can be downloaded and installed from the respective websites, it is best to use your package manager to install them, because they also require changes to your xorg.conf file, the file that controls your graphical display, and the distro packages will make the changes for you. If you do decide to go the independent route, make sure you download the correct package for your card and read the instructions carefully before you do anything.
It's not all down to the software
The hardest problems to diagnose are those that occur apparently randomly, especially if they lock up or crash the computer without warning. When the crash occurs at the same time, or using the same software, you have an idea who to blame, but if it is truly random it may well be hardware. The most common hardware causes of such problems are overheating, faulty memory or poor power.
It's no use thinking "this doesn't happen in my other OS, so it must be Linux's fault" because different systems work the hardware differently. For instance, Linux uses memory more aggressively and will experience instability due to faulty memory before Windows starts to show symptoms. Fans and heatsinks become gradually blocked with dust and other crud during a computer's lifetime. Try blowing it clear with a can of compressed air. Installing lm_sensors (your distro should have it) will let you monitor CPU and case temperatures, and a system monitor like GKrellM will display the temperatures on your desktop.
Laptops don't lend themselves to being opened up for a good blow, but you should check the various vents for any blockage. One area where laptops are fairly safe is power, since the battery ensures a clean steady supply. Desktop power supplies are another matter, especially the cheap, unnamed ones that are included with lower-priced cases.
Built down to a price, some barely meet their specs when new, so try a different PSU in your computer – you may be surprised by the difference it makes. Dirty power can damage your hardware and data, so saving money here can be a false economy whereas good-quality PSUs can go on for years. If you live in an area with unreliable or dirty power, a UPS (Uninteruptible Power Supply) may be a worthwhile investment. Surge protectors don't protect against power reductions, only surges.
Testing memory is easy, if time consuming. Most Live CDs include Memtest86, which does exactly what it says. You need to boot into Memtest86, because it can only test memory that isn't in use, so you don't want a full OS running. Let it run through its full set of tests at least twice, preferably overnight. The longer you can leave it running, the more certain you can be that your memory is OK. If you see any errors, at least one of your memory sticks will need to be replaced.
Where's my desktop?
So you've installed the latest distro, rebooted your computer and instead of the glorious 3D enhanced desktop you expected to see, all you get is a black screen with a login prompt and a blinking cursor. What went wrong? The usual cause of this is that the installer was unable to auto-detect the properties of either your graphics card or display.
Using the shell
Much of the advice we give here is in the form of terminal commands. Most distros have their own configuration programs, which vary considerably, while the underlying commands they call remain constant across all distros. By cutting out the middle man and running those commands directly, the solutions we give are portable across all distros.
Some commands need to be run as as the root user, which is done either by prefixing the command with sudo (when you will be prompted to give your password) or by running su in the terminal first, which will ask you for the root password. We use the sudo method throughout this feature, as it is the only option with some distros. If you have full root access through su, simply run the command without the sudo.
Sometimes it will drop right down to a text console, other systems may boot into a limited display, like 800x600 with no acceleration. You need to run your distro's configuration tool to generate a working display configuration, but the first step is to log in as root if possible, otherwise as your normal user, using the password you gave during installation. The program to run depends on your distro, but the most popular options are:
- openSUSE - yast2
- Debian - dpkg-reconfigure xserver-xorg
- Ubuntu - sudo dpkg-reconfigure xserver-xorg
- Mandriva - XFdrake
Code:
X -configure
Code:
startx
Code:
/var/log/ Xorg.0.log
Code:
grep EE /var/log/Xorg.0.log

Network fixes
If there is one topic that causes more tearing out of hair than any other, it's wireless networking, what with in-kernel and third-party modules, not to mention the use of Windows drivers as a last resort. Then you have the various encryption methods and a variety of network management systems to contend with.
As with all such things, when you break it down into simple steps, one complex task becomes a series of much simpler ones. The first step is to make sure your hardware drivers are loaded, so check the output of
Code:
sudo ifconfig -a
Code:
sudo lspci -k
Once you have the correct driver you can get on with configuring it, right? Well, maybe. Some wireless cards need a firmware file that is loaded on to the card when it is initialised. The driver will take care of this, but it needs the file to be in /lib/firmware. The methods for getting this file depend on the hardware in use, but usually involve extracting the firmware from the Windows drivers (or downloading a file that someone has already extracted).
So now you are ready to proceed with configuring the connection, so you can skip over the next bit. A last resort? What happens if you can't find the driver for your wireless card? In that case you will have to use NdisWrapper. This is a kernel module that uses the NDIS (Network Driver Interface Specification) drivers supplied for Windows in Linux.
The first step is to install NdisWrapper from your distro's package manager. Then you need files from the CD that came with your card. It is important to use the correct CD, because manufacturers have a habit of changing the chipsets used on a card, and hence the drivers needed, without changing the model number. You can also find information on which cards are supported by which drivers at http://burnthesorbonne. com/?page_id=32.
Once you have installed NdisWrapper, find the driver file, which will be an INF file on the CD. Load it with
Code:
sudo ndiswrapper -i /path/to/driver.inf
Code:
sudo ndiswrapper -l
Code:
sudo modprobe -v ndiswrapper
Code:
unzip /mnt/cdrom/install.exe
Also make sure that your router is not set to filter out all but specified MAC addresses (we've all been caught out by that one when using a new laptop or wireless card).
Most Linux distros use Network Manager to handle wireless (and wired) connections, and the name of your wireless access point should appear when you click on the Network Manager icon in the task bar. If it doesn't, the first thing to check is that your access point is set to broadcast its SSID (Service Set Identifier – the name of your wireless network).
Some people disable this in their access points in the belief that it increases security (it doesn't, because every time you connect to the network, you broadcast the SSID in plain text).
If it still fails to show up, try moving closer to the access point. You can also check for the presence of available networks with these terminal commands
Code:
sudo ifconfig wlan0 up
sudo iwlist wlan0 scan
Once you can connect, immediately disconnect, enable encryption in your access point/router and try again. The best encryption to use is WPA2 or, if your wireless card/ driver does not support it, use WPA (Wi-Fi Protected Access). You should not use WEP unless you absolutely cannot avoid it. It provides only minimal security and is easily cracked by a determined neighbour.
Wired networking
Networking that doesn't work is a problem that affects all operating systems from time to time. It can be frustrating to deal with, since things often seem to just not work, without giving any clue as to where the chain is broken. The first test is usually that old favourite, ping:
Code:
ping www.linuxformat.co.uk
Code:
ping 80.244.178.151
If pinging an IP address doesn't work, try pinging one of the ISP's servers, such as the DNS server (ISPs usually give DNS addresses on their websites, and it doesn't hurt to make a note of these). If that works, the problem is probably with your ISP's connection to the rest of the internet. Another possibility is that your system is trying to use IPv6, the newer IP protocol, but your router does not understand it, which causes long delays, long enough to look like it isn't working.
The next step is to check whether you can connect to your router's web interface (if it has one) or ping your modem. If this works, the link between your modem and the ISP is down, which could be a line fault (check that your cat/ significant other hasn't unplugged the ADSL phone line), a problem at the ISP or you haven't paid your bill.
At this point a phone call to your ISP's support desk is in order. If you can't get through, it's most likely a problem at their end and the only solution is patience. Finally, check everything local: are the cables connected? Does ifconfig -a even show your network interface? If not, have you changed anything since your last reboot? A kernel update will stop third-party modules working until they are reinstalled, and some network adaptors, particularly wireless ones, use third-party kernel modules.

Step by step: IPv6 troubleshooting



Code:
alias net-pf-10 off
alias ipv6 off
Software fixes
Have you ever noticed how sometimes things just go slower and slower? There's nothing specific but everything seems to take longer than it should. I find caffeine helps here, or sleep in extreme cases, but what about when it happens to the computer? There are three main resources in your computer: CPU cycles, memory and hard disk space, and it is possible that a runaway program, or even general usage, can be using up too much of one of these.
CPU usage is the easiest to check, using the top program (that's its name, not an opinion of its usefulness). Run this from a terminal and you'll see rows and columns of data in the terminal window. The CPU line shows how much of the processor is being used by various types of program: sy is system, us is user and ni refers to programs that are running with a positive nice value. Nice is a way of scheduling programs to use more or less CPU; the higher the niceness, the nicer a program is to other process, letting them have first pick at the available CPU cycles.
It's a little more complicated than that, as nice is only a recommendation to the kernel's process scheduler, but that's too involved to go into here.
Double top
If you have more than one CPU core, press 1 to have top show them all. The figures to look at first are id and wa, for idle and wait. Unless you are compiling software or playing video, the idle figure should be quite high, usually over 90%. If it is down to single figures, or even zero, something is sucking up all your CPU cycles. That's fine if it's something you intend to do, like transcoding a video, but it could also be a runaway process.

The list of processes shows the amount of CPU and memory that each program is using, and by default this is sorted by CPU usage. If something is hogging the processor, you can use top to either renice it, if it is something you want to keep running, or kill it.
The first column shows the PID, which is the program's unique process ID. Press R to renice or K to kill, and type in the PID of the process. Renicing asks for a number to nice the process by, which is added to the existing value (higher number are 'nicer'). Nice values can run from -20 to 19, but only the root user can set a negative value.
Five is a good starting point, and 19 means the process only gets CPU time when nothing else wants it, which is useful for an intensive background task like video transcoding. Killing a task sends signal 15 (the TERM signal), and is the equivalent to pressing Ctrl+C in a terminal. It asks the program to stop, so the program has the opportunity to shut down cleanly. If the program is really out of control, it may not respond to this, so you should send signal 9 (KILL) which will stop the program without giving it that chance to elegantly shut down.
Step by step: dealing with bugs


Code:
progname --verbose > program.log

No comments:
Post a Comment