剑客
关注科技互联网

逆向ARM64内核zImage

主流的旗舰Android手机已经尽数升级到64位,相应的,内核镜像zImage也发生了改变。如果想要用IDA Pro逆向分析arm64的手机内核,特别是完成内核符号的加载,着实需要折腾一番功夫。

从/dev/block或ROM包中提取boot.img,然后用abootimg -x解开得到zImage

如果zImage是gzip压缩的,就gzip -d解压得到kernel

以上两部都是常规项目,下面重点是要从kernel中提取本应显示在/proc/kallsyms下的内核符号,这样IDA Pro加载分析时才更得心应手。参考 Bits, Please!
的文章中32位的kernel符号提取方法,可以很快想到64位的解决方案:

首先要知道内核加载时的虚拟地址,一种投机的方法是,手机开机后执行:

shell@surabaya:/ $ dmesg
...
[    0.000000] Virtualkernelmemorylayout:
[    0.000000]    vmalloc : 0xffffff8000000000 - 0xffffffbdbfff0000  (  246 GB)
[    0.000000]    vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000  (    8 GBmaximum)
[    0.000000]    PCI I/O : 0xffffffbffa000000 - 0xffffffbffb000000  (    16 MB)
[    0.000000]    fixed  : 0xffffffbffbdfe000 - 0xffffffbffbdff000  (    4 KB)
[    0.000000]    modules : 0xffffffbffc000000 - 0xffffffc000000000  (    64 MB)
[    0.000000]    memory  : 0xffffffc000000000 - 0xffffffc0fe550000  (  4069 MB)
[    0.000000]      .init : 0xffffffc001600000 - 0xffffffc001813000  (  2124 KB)
[    0.000000]      .text : 0xffffffc000080000 - 0xffffffc001600000  ( 22016 KB)
[    0.000000]      .data : 0xffffffc00181d000 - 0xffffffc001995f80  (  1508 KB)
...

由于现在手机还没有开启KASLR,所以基地址基本上总是0xffffffc000080000,有了这个地址就可以从kernel中找到symbol table了。内核导出的前两个符号stext,_text等总是指向0xffffffc000080000,所以搜索连续的两个0xffffffc000080000就能找到symbol table。之后按照 Bits, Please!
的方法就可以导出所有符号了,唯一要注意的是32位到64位,地址长度变成了8字节,内存对齐也从0x10变成了0x100。修改原来的Python脚本,开发了一个arm64解析符号的脚本:

import sys
import struct
 
#The default address at which the kernel text segment is loaded
DEFAULT_KERNEL_TEXT_START = 0xffffffc000080000
 
#The size of the QWORD in a 64-bit architecture
QWORD_SIZE = struct.calcsize("Q")
 
#The size of the DWORD in a 32-bit architecture
DWORD_SIZE = struct.calcsize("I")
 
#The size of the WORD in a 32-bit architecture
WORD_SIZE = struct.calcsize("H")
 
#The alignment of labels in the resulting kernel file
LABEL_ALIGN = 0x100
 
#The minimal number of repeating addresses pointing to the kernel's text start address
#which are used as a heuristic in order to find the beginning of the kernel's symbol
#table. Since usually there are at least two symbols pointing to the beginning of the
#text segment ("stext", "_text"), the minimal number for the heuristic is 2.
KALLSYMS_ADDRESSES_MIN_HEURISTIC = 2
 
def read_qword(kernel_data, offset):
 '''
Reads a DWORD from the given offset within the kernel data
'''
 return struct.unpack("<Q", kernel_data[offset : offset + QWORD_SIZE])[0]
 
def read_dword(kernel_data, offset):
 '''
Reads a DWORD from the given offset within the kernel data
'''
 return struct.unpack("<I", kernel_data[offset : offset + DWORD_SIZE])[0]
 
def read_word(kernel_data, offset):
 '''
Reads a WORD from the given offset within the kernel data
'''
 return struct.unpack("<H", kernel_data[offset : offset + WORD_SIZE])[0]
 
def read_byte(kernel_data, offset):
 '''
Reads an unsigned byte from the given offset within the kernel data
'''
 return struct.unpack("<B", kernel_data[offset : offset + 1])[0]
 
def read_c_string(kernel_data, offset):
 '''
Reads a NUL-delimited C-string from the given offset
'''
 current_offset = offset
 result_str = ""
 while kernel_data[current_offset] != '/x00':
 result_str += kernel_data[current_offset]
 current_offset += 1
 return result_str
 
def label_align(address):
 '''
Aligns the given value to the closest label output boundry
'''
 return address & ~(LABEL_ALIGN-1)
 
def find_kallsyms_addresses(kernel_data, kernel_text_start):
 '''
Searching for the beginning of the kernel's symbol table
Returns the offset of the kernel's symbol table, or -1 if the symbol table could not be found
'''
 search_str = struct.pack("<Q", DEFAULT_KERNEL_TEXT_START) * KALLSYMS_ADDRESSES_MIN_HEURISTIC
 return kernel_data.find(search_str)
 
def get_kernel_symbol_table(kernel_data, kernel_text_start): 
 '''
Retrieves the kernel's symbol table from the given kernel file
'''
 
 #Getting the beginning and end of the kallsyms_addresses table
 kallsyms_addresses_off = find_kallsyms_addresses(kernel_data, kernel_text_start)
 kallsyms_addresses_end_off = kernel_data.find(struct.pack("<Q", 0), kallsyms_addresses_off)
 num_symbols = (kallsyms_addresses_end_off - kallsyms_addresses_off) / QWORD_SIZE
 
 #Making sure that kallsyms_num_syms matches the table size
 kallsyms_num_syms_off = label_align(kallsyms_addresses_end_off + LABEL_ALIGN)
 kallsyms_num_syms = read_qword(kernel_data, kallsyms_num_syms_off)
 if kallsyms_num_syms != num_symbols:
 print "[-] Actual symbol table size: %d, read symbol table size: %d" % (num_symbols, kallsyms_num_syms)
 return None 
 
 #Calculating the location of the markers table
 kallsyms_names_off = label_align(kallsyms_num_syms_off + LABEL_ALIGN)
 current_offset = kallsyms_names_off
 for i in range(0, num_symbols):
 current_offset += read_byte(kernel_data, current_offset) + 1
 kallsyms_markers_off = label_align(current_offset + LABEL_ALIGN)
 
 #Reading the token table
 '''
        Not sure if this can be a universal solution
        '''
 kallsyms_token_table_off = label_align(kernel_data.find(struct.pack("<Q", 0)*2, kallsyms_markers_off)+LABEL_ALIGN)
## kallsyms_token_table_off = label_align(kallsyms_markers_off + (((num_symbols + 255) >> 8) * QWORD_SIZE))
 current_offset = kallsyms_token_table_off
 for i in range(0, 256):
 token_str = read_c_string(kernel_data, current_offset)
 current_offset += len(token_str) + 1
 kallsyms_token_index_off = label_align(current_offset + LABEL_ALIGN)
 
 #Creating the token table
 token_table = []
 for i in range(0, 256):
 index = read_word(kernel_data, kallsyms_token_index_off + i * WORD_SIZE)
 token_table.append(read_c_string(kernel_data, kallsyms_token_table_off + index))
 
 #Decompressing the symbol table using the token table
 offset = kallsyms_names_off
 symbol_table = []
 for i in range(0, num_symbols):
 num_tokens = read_byte(kernel_data, offset)
 offset += 1
 symbol_name = ""
 for j in range(num_tokens, 0, -1):
 token_table_idx = read_byte(kernel_data, offset)
 symbol_name += token_table[token_table_idx]
 offset += 1
 
 symbol_address = read_qword(kernel_data, kallsyms_addresses_off + i * QWORD_SIZE)
 symbol_table.append((symbol_address, symbol_name[0], symbol_name[1:]))
 
 return symbol_table
 
def main():
 
 #Verifying the arguments
 if len(sys.argv) < 2:
 print "USAGE: %s: <KERNEL_FILE> [optional: <0xKERNEL_TEXT_START>]" % sys.argv[0]
 return
 kernel_data = open(sys.argv[1], "rb").read()
 kernel_text_start = int(sys.argv[2], 16) if len(sys.argv) == 3 else DEFAULT_KERNEL_TEXT_START
 
 
 #Getting the kernel symbol table
 symbol_table = get_kernel_symbol_table(kernel_data, kernel_text_start)
 fp = open("syms","wb")
 for symbol in symbol_table:
 print "%016X %s %s" % symbol
 fp.write("%016X %s %s/n" % symbol)
 fp.close()
 
if __name__ == "__main__":
 main()

输出的符号会按照/proc/kallsyms打印出来,同时会写入当前目录syms文件。接下来就是让IDA Pro识别syms文件了,我的做法是针对每个符号尝试给特定地址重命名,如果失败就undefine以后再试一次,对于代码段的函数都重新makecode一次:

lines = open("syms","rb").read().split("/n")
for linein lines:
    [addr, type, name] = line.split(" ")
    if not MakeNameEx(int(addr,16), name, SN_NOWARN):
        MakeUnkn(int(addr,16),1)
        MakeNameEx(int(addr,16), name, SN_NOWARN)
    if type == "t" or type=="T":
        MakeUnkn(int(addr,16),1)
        MakeCode(int(addr,16))
分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址