From: eLinux.org
Boot Time
Contents
Introduction
Boot Time includes topics such as measurement, analysis, human factors,
initialization techniques, and reduction techniques. The time that a
product takes to boot directly impacts the first perception an end user
has of the product. Regardless of how attractive or well designed a
consumer electronic device is, the time required to move the device from
off to an interactive, usable state is critical to obtaining a positive
end user experience. Turning on a device is Use Case #1.
Booting up a device involves numerous steps and sequences of events. In
order to use consistent terminology, the Bootup Time Working
Group of the CE
Linux Forum came up with a list of terms and their widely accepted
definitions for this functionality area. See the following page for
these terms:
Technology/Project Pages
The following are individual pages with information about various
technologies relevant to improving Boot Time for Linux. Some of these
describe local patches available on this site. Others point to projects
or patches maintained elsewhere.
Measuring Boot-up Time
- Printk Times - simple system for
showing timing information for each printk.
- Kernel Function
Trace - system for
reporting function timings in the kernel.
- Linux Trace Toolkit -
system for reporting timing data for certain kernel and process
events.
- Oprofile - system-wide
profiler for Linux.
- Bootchart - a tool for performance
analysis and visualization of the Linux boot process. Resource
utilization and process information are collected during the
user-space portion of the boot process and are later rendered in a
PNG, SVG or EPS encoded chart.
- Bootprobe
- a set of System Tap scripts for
analyzing system bootup.
- and, let us not forget: "cat /proc/uptime"
- grabserial
- a nice utility from Tim Bird to log and timestamp console output
- process
trace
- a simple patch from Tim Bird to log exec, fork and exit system
calls.
- ptx_ts -
Pengutronix' TimeStamper: A small filter prepending timestamps to
STDOUT; a bit similar to grabserial but not limited to serial ports
- Initcall Debug - a kernel
command line option to show time taken for initcalls.
- See also: Kernel
Instrumentation
which lists some known kernel instrumentation tools. These are of
interest for measuring kernel startup time.
Technologies and Techniques for Reducing Boot Time
Bootloader speedups
Kernel speedups
- Disable Console - Avoid
overhead of console output during system startup.
- Disable bug and printk - Avoid the overhead of bug and printk.
Disadvantage is that you lose a lot of info.
- RTC No Sync - Avoid delay to
synchronize system time with RTC clock edge on startup.
- Short IDE Delays - Reduce
duration of IDE startup delays (this is effective but possibly
dangerous).
- Hardcode kernel module
info -
Reduce the overhead of loading a module, by hardcoding some
information used for loading the relocation information
- IDE No Probe - Force kernel to
observe the ide\=noprobe option.
- Preset LPJ - Allow the use of a preset
loops_per_jiffy value.
- Asynchronous function
calls -
Allow probing or other functions to proceed in parallel, to overlap
time-consuming boot-up activities.
- Reordering of driver
initialization
- Allow driver bus probing to start as soon as possible.
- Deferred Initcalls -
defer non-essential module initialization routines to after primary
boot
- NAND ECC improvement - The pre 2.6.28 nand_ecc.c has room for
improvement. You can find an improved version in the mtd git at
http://git.infradead.org/mtd-2.6.git?a=blob_plain;f=drivers/mtd/nand/nand_ecc.c;hb=HEAD.
Documentation for this is in
http://git.infradead.org/mtd-2.6.git?a=blob_plain;f=Documentation/mtd/nand_ecc.txt;hb=HEAD.
This is only interesting if your system uses software ECC
correction.
- Check what kernel memory allocator you use. Slob or slub might be
better than slab (which is the default in older kernels)
- If your system does not need it, you can remove SYSFS and even
PROCFS from the kernel. In one test removing sysfs saved 20 ms.
- Carefully investigate all kernel configuration options on whether
they are applicable or not. Even if you select an option that is not
used in the end, it contributes to the kernel size and therefore to
the kernel load time (assuming you are not doing kernel XIP). Often
this will require some trial and measure! E.g. selecting
CONFIG_CC_OPTIMIZE_FOR_SIZE (found under general setup) gave in
one case a boot improvement of 20 ms. Not dramatic, but when
reducing boot time every penny counts!
- Moving to a different compiler version might lead to shorter and/or
faster code. Most often newer compilers produce better code. You
might also want to play with compiler options to see what works
best.
- If you use initramfs in your kernel and a compressed kernel it is
better to have an uncompressed initramfs image. This is to avoid
having to uncompress data twice. A patch for this has been submitted
to LKML. See
http://lkml.org/lkml/2008/11/22/112
File System issues
Different file systems have different initialization (mounting) times,
for the same data sets. This is a function of whether meta-data must be
read from storage into RAM or not, and what algorithms are used during
the mount procedure.
- Filesystem
Information - has
information about boot-up times of various file systems
- File Systems - has information on
various file systems that are interesting for embedded systems. Also
includes some improvement suggestions.
- Avoid Initramfs - explains on
why initramfs should be avoided if you want to minimize boot time
- Split partitions. If mounting a file system takes long, you can
consider splitting that filesystem in two parts, one with the info
that is needed during or immediately after boot, and one which can
be mounted later on.
- Ramdisks demasked -
explains why using a ram disk generally results in a longer boot
time, not a shorter one.
User-space and application speedups
- Optimize RC Scripts -
Reduce overhead of running RC scripts
- Parallel RC Scripts -
Run RC scripts in parallel instead of sequentially
- Application XIP - Allow
programs and libraries to be executed in-place in ROM or FLASH
- Pre Linking - Avoid cost of runtime
linking on first program load
- Statically link applications. This avoids the costs of runtime
linking. Useful if you have only a few applications. In that case it
could also reduce the size of your image as no dynamic libraries are
needed
- GNU_HASH: ~ 50% speed improvement in dynamic linking
- Application Init
Optimizations
- Improvements in program load and init time via:
- use of mmap vs. read
- control over page mapping characteristics.
- Include modules in kernel
image
- Avoid extra overhead of module loading by adding the modules to
the kernel image
- Speed up module loading - Use Alessio Igor Bogani's kernel patches
to improve module loading time by "Speed up the symbols' resolution
process"
(Patch
1,
Patch
2,
Patch
3,
Patch
4,
Patch
5).
- Avoid udev, it takes quite some time to populate the /dev directory.
In an embedded system it is often known what devices are present and
in any case you know what drivers are available, so you know what
device entries might be needed in /dev. These should be created
statically, not dynamically. mknod is your friend, udev is your
enemy.
- If you still like udev and also like fast boot-up's, you might go
this way: start your system with udev enabled and make kind of a
backup of the created device nodes. Now, modify your init script
like this: instead running udev, copy the device nodes that you just
made a backup of into the device tree. Now, install the
hotplug-daemon like you always do. That trick avoids the device node
creation at startup but stills lets your system create device nodes
later on.
- If your device has a network connection, preferably use static IP
addresses. Getting an address from a DHCP server takes additional
time and has extra overhead associated with it.
- Moving to a different compiler version might lead to shorter and/or
faster code. Most often newer compilers produce better code. You
might also want to play with compiler options to see what works
best.
- If possible move from glibc to uClibc. This leads to smaller
executables and hence to faster load times.
library optimiser tool:
http://libraryopt.sourceforge.net/
This will allow you to create an optimised library. As unneeded
functions are removed this should lead to a performance gain.
Normally there will be library pages which contain unused code
(adjacent to code that is used). After optimizing the library this
does not occur any more, so less pages are needed and hence less
page loads, so some time can be saved.
Function reordering:
http://www.celinux.org/elc08_presentations/DDLink%20FunctionReorder%2008%2004.pdf
This is a technique to rearrange the functions within an executable
so they appear in the order they are needed. This improves the load
time of the application as all initialization code is grouped into a
set of pages, instead of being scattered over a number of pages.
Another approach to improve boot time is to use a suspend related
mechanism. Two approaches are known.
Using the standard hibernate/resume approach. This is what has been
demonstrated by Chan Ju, Park, from Samsung. See sheet 23 and
onwards from this
PPT
and section 2.7 of this
paper.
Issue with this approach is that flash write is much slower than
flash read, so the actual creation of the hibernate image might take
quite a while.
- Implementing snapshot boot. This is done by Hiroki Kaminaga from
Sony and is described at snapshot boot for
ARM and
http://elinux.org/upload/3/37/Snapshot-boot-final.pdf
This is similar to hibernate and resume, but the hibernate file is
retained and used upon every boot. Disadvantage is that no writable
partitions should be mounted at the time of making the snapshot.
Otherwise inconsistencies will occur if a partition is modified,
while applications in the hibernate file might have information in
the snapshot related to the unmodified partition.
Miscellaneous topics
About Compression discusses
the effects of compression on boot time. This can affect both the kernel
boot time as well as user-space startup.
Uninvestigated speedups
This section is a holding pen for ideas for improvement that are not
implemented yet but that could result in a boot time gain. Please leave
a note here if you are working on one of these items to avoid duplicate
work.
- Prepopulated buffer cache - As initramfs performs an additional
copy of the data the idea is to have a prepopulated buffer cache. A
simplistic scenario would allow dumping the buffer cache when the
booting is completed and the user applications have initialised.
This data then could be used in a subsequent boot to initialize the
buffer cache (of course without copying). A possible approach would
be to have those data to reside into the kernel image and use them
directly. Alternately they could be loaded separately.
Unfortunately my knowledge of the internals in this section is not
yet good enough to do a trial implementation.
Caveats:
- is it possible to have the buffer cache split into two different
parts, one which is statically allocated, one which is
dynamically allocated?
- the pages in the prepopulated buffer cache probably cannot be
discarded, so they should be pinned
- apart from the buffer cache data itself also some other
variables might need restoring
- a similar approach could also be used for the cached file data.
- Dedicated fs - currently a lot of abstraction is done in fs to
make a nice abstraction allowing easy addition of new filesystems
and creating a unified view of those filesystem. While this is
pretty neat, the abstraction layers also introduce some overhead. A
solution could be to create a dedicated fs system, which supports
only one (or maybe 2) filesystems, and eliminates the abstraction
overhead. This will give some benefit, but the chance of getting
this into the mainline is zero.
Articles and Presentations
- Embedded Linux boot time reduction workshop
materials
- By Free Electrons
- Presentation on boot time reduction techniques - Practical labs
on Atmel SAMA5 hardware.
- "Boot Time Optimizations" -
(Slides
|
Video)
- "The Right Approach to Boot Time Reduction" -
(Slides
| YouTube Video)
- Andrew Murray has presented at ELC Europe on October 28, 2010
(Free Electrons video
here)
- This included a \< 1 second QT cold Linux boot case study for an
SH7724 with some additional information about 'function
re-ordering' in user-space
- Similar slides with \< 1 second case study for OMAP3530EVM can
be found
here
- "One Second Linux Boot Demonstration (new version)" (Youtube video
by MontaVista)
- "Tools and Techniques for Reducing Bootup Time"
(PPT
|
ODP
|
PDF
|
video)
- Tim Bird has presented at ELC Europe, on November 7, 2008, his
latest collection of tips and tricks for reducing bootup time
- Tims Fastboot
Tools has online
materials in support of this presentation
- Christopher
Hallinan has done a
presentation at the MontaVista Vision conference 2008 on the topic
of reducing boot time. Slides available
here
- Optimizing Linker Load Times
- (introducing various kinds of bootuptime reduction, prelinking,
etc.)
- Benchmarking boot latency on
x86
- By Gilad Ben-Yossef, July 2008
- A tutorial on using TSC register and the kernel PRINTK_TIMES
feature to measure x86 system boot time, including BIOS,
bootloader, kernel and time to first user program.
- Fast Booting of Embedded
Linux
- By HoJoon Park, Electrons and Telecommunications Research
Institute (ETRI), Korea, Presented at the CELF 3rd Korean
Technical
Jamboree,
July 2008
- Explains several different reduction techniques used for
different phases of bootup time
- Tim Bird's (Sony) survey of boot-up time reduction techniques:
- Parallelizing Linux Boot on CE Devices
- Parallelize Applications for Faster Linux
Boot
- Authored by M. Tim Jones for IBM Developer Works
- This article shows you options to increase the speed with which
Linux boots, including two options for parallelizing the
initialization process. It also shows you how to visualize
graphically the performance of the boot process.
- Android Boot Time
Optimization
- Authored by Kan-Ru Chen, 0xlab
- This presentation covers Android boot time measurement and
analysis, the proposed reduction approaches, hibernation-based
technologies, and potential Android user-space optimizations.
- Texas Instruments Embedded Processors Wiki provides the procedure to
optimize Linux/Android boot time:
- Implement Checkpointing for
Android
- Authored by Kito Cheng and Jim Huang, 0xlab
- Reasons to Implement Checkpointing for Android
- Resume to stored state for faster Android boot time
- Better product field trial experience due to regular
checkpointing
Case Studies
Additional Projects/Mailing Lists/Resources
Replacements for SysV 'init'
The traditional method of starting a Linux system is to use /sbin/init,
which processes the file /etc/inittab. This is an init program which
processes a series of actions for different run-levels and system events
(key-combinations and power events).
See the init(8) man page and the the
inittab(5) man page.
busybox init
An 'init' applet is often included in BusyBox
There used to be (as of 2000) some slight differences in the supported
features of the 'inittab' file between busybox init and full-blown init.
However, I don't know (as of 2010) if that's still the case. (See
http://spblinux.de/2.0/doc/init.html
for some details)
Denys Vlasenko, one of the maintainers of busybox has suggested a
replacement for traditional init for that tool called runsv. See
http://busybox.net/~vda/init_vs_runsv.html
upstart
upstart is the name of a newer Linux desktop systems that provides the
program /sbin/init, but with different operational semantics.
Android init
Android 'init' is a custom program for booting the Android system.
See Android 'init'
systemd
systemd is a new project (as of May 2010) for starting daemons and
services on a Linux desktop system
See
http://0pointer.de/blog/projects/systemd.html
Kexec
- Kexec is a system which allows a system to be rebooted without
going through BIOS. That is, a Linux kernel can directly boot into
another Linux kernel, without going through firmware. See the white
paper at:
kexec.pdf
Splash Screen projects
- Splashy - Technology to
put up a splash screen early in the boot sequence. This is
user-space code.
- This seems to be the most current splash screen technology, for
major distributions. A framebuffer driver for the kernel is
required.
- Gentoo
Splashscreen -
newer technology to put a splash screen early in the boot sequence
- PSplash - PSplash is a userspace
graphical boot splash screen for mainly embedded Linux devices
supporting a 16bpp or 32bpp framebuffer.
- bootsplash.org - put up a splash
screen early in boot sequence
- This project requires kernel patches
- This project is now abandoned, and work is being done on
Splashy.
Others
Apparently obsolete or abandoned material
in
progress - Boot-up Time Reduction
Howto
- this is a project to catalog existing boot-up time reduction
techniques.
- Was originally intended to be the authoritative source for
bootup time reduction information.
- No one maintains it any more (as of Aug, 2008)
no content
yet - Boot-up Time Delay
Taxonomy
- list of delays categorized by boot phase, type and magnitude
- Was to be a survey of common bootup delays found in embedded
devices.
- Was never really written.
???
Companies, individuals or projects working on fast booting
Boot time check list
From an August 2009 discussion about boot time on ARM
devices,
several hints and advice regarding boot time optimization are available.
While it may repeat a lot of above, below is a check list extracted from
this discussion:
Is CPU's clock switched to maximum? If the kernel, bootloader or
hardware is in charge of setting CPU power and speed scaling, then
you should check that it boots with the CPU set at maximum speed
instead of slowest.
Is your hardware (register) timing configuration of your SoC's
memory interfaces (e.g. RAM and NOR/NAND timing) optimized? A lot of
vendors ship their hardware with "well, it works, optimize later"
settings. What you want is "as fast as possible, but sill stable and
reliable" configuration. This might need some hardware knowledge and
has to be customized to the individual memory devices used.
Does your boot loader uses I- and D-Cache? E.g. U-Boot doesn't
enable D-Cache by default on ARM devices, as it needs customized MMU
tables to do so.
Does kernel copy from permanent storage (e.g. NOR or NAND) to RAM
use optimized functions? E.g. DMA, or on ARM at least load/store
multiple commands (ldm/stm)?
If you use U-Boot's uImage, set "verify=no" in U-Boot to avoid
checksum verification.
Optimize size of your kernel.
- You might even try some of the embedded system kernel config
options that, for example, eliminate all the printk strings,
reduce data structures, or eliminate unneeded functionality.
How often is kernel (image) data copied? First by boot loader from
storage to RAM, then by kernel's uncompressor to it's final
destination? Once more? If you use compressed kernel and NOR flash,
consider running the uncompressor XIP in NOR flash.
If you use compressed kernel, check compression algorithm. zlib is
slow on decompression, and lzo is much faster. So if you implement
lzo compression, you'll probably speed things up a little as well
(check LKML for this). Having no compression at all may also be a
good thing to try (see next topic).
Check to use uncompressed kernel (depends on your system
configuration). Using an uncompressed kernel on a flash-based system
may improve boot time. The reason is that compressed kernels are
faster only when the throughput to the persistent storage is lower
than the decompression throughput, and on typical embedded systems
with DMA the throughput to memory outperforms the CPU-based
decompression. Of course it depends on a lot of stuff like
performance of flash controller, kernel storage filesystem
performance, DMA controller performance, cache architecture etc. So
it's individual per-system. Example: With using an uncompressed
kernel (~2.8MB) uncompressing (running the uncompressor XIP in NOR
flash) took ~0.5s longer than copying 2.8MB from flash to RAM.
Enable precalculated loops-per-jiffy
Enable kernel quiet option
If you use UBI: UBI is rather slow in attaching MTD devices.
Everything is explained at MTD's UBI
scalability
and UBI fs
scalability
sections. There is not very much you can do to speed it up but
implement UBI2. UBIFS would stay intact. There were discussions
about this and it does not seem to be impossibly difficult to do
UBI2 (few
ideas).
- In a follow-up e-mail, Sascha Hauer wrote:
"What's interesting about this is that the kernel NAND driver is much
slower
than the one in U-Boot. Looking at it it turned out that the kernel
driver uses interrupts to wait for the controller to get ready.
Switching this to polling nearly doubles the NAND performance. UBI
mounts much faster now and this cuts off another few seconds from the
boot process :) "
Use static device nodes during boot, and later setup busybox mdev
for hotplug.
If you have network enabled, there might be some very long timeouts
in the network code paths, which appear to be used whether you
specify a static address or not. See the definitions of
CONF_PRE_OPEN and CON_POST_OPEN in net/ipv4/ipconfig.c. Check
ipdelay configuration
patch.
Parallelize boot process.
Disable the option "Set system time from RTC on startup and resume",
you can use the command hwclock -s at the of the init instead of
slowing down the kernel.
Categories: