Profiling the software with oprofile

The new system design ideas brought by the Xilinx Zynq-7000, as well as the Profiling object libjpeg, have been described in the previous section, and will not be described again.

I. Introduction to Oprofile

Profiling is a formal summary or analysis of data for different performance characteristics, usually in the form of graphs and tables. It provides the percentage or number of samples collected for a particular processor event, such as cache miss rate, TLB miss rate, and so on. In general, the main purpose is to find performance bottlenecks in the software, and then targeted optimization to improve the overall performance of the software.

Oprofile is one of several evaluation and performance monitoring tools for Linux. It can work on different architectures including ARM, PowerPC, MIPS, IA32, IA64 and AMD Athlon. It has a small overhead, and since Linux 2.6, it has been included in the Linux kernel.

Oprofile collects information about processor events and helps users identify issues such as loop unrolling, low cache usage, inefficient type conversion and redundancy operations, and mispredictive branching. Oprofile is a fine-grained tool that collects samples for instruction sets or for functions, system calls, or interrupt handling routines. Oprofile works by sampling. Using the collected evaluation data, users can easily identify performance issues.

By monitoring CPU hardware events, oprofile can profiling the entire Linux system while it is running. Profiling objects can be Linux kernels (including modules and interrupt handlers), shared libraries or applications.

Starting with version 0.9.8, oprofile supports the Perf_events profiling mode mode. The application operf is used to control the profiling process; in legacy mode, it is done through the opcontrol script and the oprofiled daemon. Operf no longer needs the OProfile kernel driver like the legacy mode, which works directly with the Linux Kernel Performance Events Subsystem. With operf, you can profiling the user's application with the normal user's identity. Of course, if you need to profiling the entire system, you still need root privileges.

If the hardware does not support OProfile using performance counters, OProfile will only work in TImer Mode. TImer Mode can only be used in legacy profiling mode, which can only be controlled by opcontrol scripts.

Oprofile's website is:
The hardware event type of the processor that can be supported:
For the Zynq-7000, all the hardware event types supported by the ARM Cortex-A9 Core PMU (Performance Monitor Unit) are listed. It can be seen that oprofile can support many in-processor analysis.

It provides some results of oprofile generation, which is convenient for developers to understand what oprofile can do before they start using.

Detailed documentation for Oprofile:

Advantages of Oprofile:
Ÿ Low running overhead Ÿ Little impact on profiling objects Ÿ Profiling interrupt handlers
Ÿ Profiling applications and shared libraries
Ÿ can profiling dynamically compiled (JIT) code
Ÿ You can do profiling the entire system
Ÿ You can observe the internal details of the CPU, such as cache miss rate
Ÿ Multi-source code can do annotaTIon
Ÿ Can support instrucTIon-level profiling
Ÿ Can generate call-graph profiles

However, OProfile is not a panacea, it also has its own limitations:
生成 Call graph profiles can only be generated on x86, ARM, and PowerPC architectures
Ÿ 100% accurate instruction-level profiling is not supported
支持 Support for dynamically compiled (JIT) code profiling is not yet complete.

In any case, Oprofile's features are much better than gprof, at the cost of configuration is more troublesome.

2. Compile Oprofile

First of all, it is best to select Oprofile driver in the Linux kernel to get comprehensive support.

Download Linux kernel Source: From https://github.com/Xilinx/linux-xlnx you can download the verified kernel provided by Xilinx. If it is not convenient to use the git tool under Linux, you can click the release on the page to find the corresponding version to download the tar ball. When downloading, it is best to choose tar.gz format instead of zip format, because the latter may have problems when dealing with symbol link.

Because I am using Xilinx Linux pre-built 14.7, I downloaded linux-xlnx-xilinx-v14.7.tar.gz here.

After decompressing, use the following command to bring up the Linux kernel configuration interface:
Export ARCH=arm
Export CROSS_COMPILE=arm-xilinx-linux-gnueabi-
Make xilinx_zynq_defconfig
Make xconfig or make menuconfig

On the configuration interface, check the following two items:
General setup --->
[*] Profiling support
<*> OProfile system profiling

Then make uImage to generate a new uImage to replace the Linux kernel image in Xilinx Linux pre-built 14.7. At the same time we also need vmlinux to check the results of profiling.

Oprofile requires popt, bfd, liberty libraries. To use these libraries on embedded boards, you need to manually cross-compile.

For popt 1.7, compile with the following command:
./configure --prefix=/home/wave/xilinx/oprofileprj/rootfs --host=arm-xilinx-linux-gnueabi --with-kernel-support --disable-nls && make && make install

For binutils 2.24, compile with the following command:
./configure --host=arm-xilinx-linux-gnueabi --prefix=/home/wave/xilinx/oprofileprj/rootfs --enable-install-libbfd --enable-install-libiberty --enable-shared && make && Make install
However, --enable-install-libiberty has no effect, so you need to manually copy libiberty.a and libiberty.h to the appropriate location.

Non Standard Power Supplies

Non Standard Power Supplies,400W Server Power Supply,250W Non Standard Power Supply,180W Switching Power Supply

Boluo Xurong Electronics Co., Ltd. , https://www.greenleaf-pc.com