Basic Concept on RTC

Doing a research on internet, and summarize as below:  

    //

      // Runtime Checking

      // Small Type Check (/RTCc)

      //

      /*

           003A11F4  jmp         _RTC_Check_2_to_1 (3A14A0h)

           003A11F9  jmp         _RTC_Check_8_to_1 (3A1600h)

           003A11FE  jmp         _RTC_Check_8_to_2 (3A1670h)

           003A1203  jmp         _RTC_Check_8_to_4 (3A16B0h)

           003A1208  jmp         _RTC_Check_4_to_1 (3A15D0h)

           003A120D  jmp         _RTC_Check_4_to_2 (3A1640h)

      */

      char ch = 0;

      short s = 0x101;

      ch = s;

[Note]: RTCc is used to check if a data loss is happened when doing a built-in type cast. The above 6 RTC functions do the real job.

 

      //

      // Uninitialized Variables (/RTCu)

      //

      /*

           000D14A9  mov         byte ptr [ebp-46h],0

           000D14AD  lea         eax,[ch]

           000D14B0  push        eax 

           000D14B1  push        offset string "%c" (0D5748h)

           000D14B6  call        dword ptr [__imp__scanf (0D82C0h)]

           000D14BC  add         esp,8

           000D14BF  movsx       eax,byte ptr [ch]

           000D14C3  cmp         eax,79h

           000D14C6  jne         wmain+33h (0D14D3h)

           000D14C8  mov         byte ptr [ebp-46h],1

           000D14CC  mov         dword ptr [a],0Ah

           000D14D3  cmp         byte ptr [ebp-46h],0

           000D14D7  jne         wmain+46h (0D14E6h)

           000D14D9  push        offset  (0D1501h)

           000D14DE  call        @ILT+170(__RTC_UninitUse) (0D10AFh)

      */

      int a;

      char ch;

      scanf("%c", &ch);

      if (ch == ‘y’)

           a = 10;

      printf("%d", a);

[Note]: RTCu is used to check if there exists some local variable without initializing before its usage. The principle is directly easy. Set a flag (a hidden local variable) to 0 at the first, and set it to 1 as soon as the variable it watches being initialized. Before using this variable, use cmp to compare the flag with 0. If not equals, __RTC_UninitUse will report to us.

 

      //

      // Stack Check (/RTCs)

      // Initialize local variables with 0xcc (int 3), if run as code, breakpoint exception will be raised

      /*

           009314AC  lea         edi,[ebp-0E4h]

           009314B2  mov         ecx,39h

           009314B7  mov         eax,0CCCCCCCCh

           009314BC  rep stos    dword ptr es:[edi]

      */

      int a;

      int b;

      int c;

 

      // Check if esp is messed up

      // Check buffer overrun issue

      // Using buffer’s location and its length to do the checking. Check if the head and tail of buffer is still 0xcc.

      /*

           0012154E  mov         esi,esp

           00121550  lea         eax,[buffer]

           00121553  push        eax 

           00121554  push        offset string "%s" (125744h)

           00121559  call        dword ptr [__imp__scanf (1282C4h)]

           0012155F  add         esp,8

           00121562  cmp         esi,esp

           00121564  call        @ILT+345(__RTC_CheckEsp) (12115Eh)

           00121569  xor         eax,eax

           0012156B  push        edx 

           0012156C  mov         ecx,ebp

           0012156E  push        eax 

           0012156F  lea         edx,[ (121590h)]

           00121575  call        @ILT+140(@_RTC_CheckStackVars@8) (121091h)

      */

      char buffer[10];

      scanf("%s", buffer);

[Note]: cmp is very important here. It checks if stack is balance after calling an extern function. __RTC_CheckEsp will check the result of cmp, if they are not same, __RTC_CheckEsp will use a simple jmp instruction to report the error to us. (prompt a dialog)

/SAFESEH

Recall two most famous scenarios of Stack Overrun in x86 platfrom:
  1. Buffer overrun, and return address pushed by CALL is modified by the malicious code.
  2. Buffer overrun, and address of SEH callback function (exception handler) is modified by the malicious code.
For the first case, we already have /GS to work it around. But how about the second one? We have /SAFESEH (Linker Option).
A quick searching on /SAFESEH in MSDN, you will find /SAFESEH is only supported on x86, while not on x64 & IA64. Because of their proprietary calling convetion, the above two cases will not make sense on these two platform. /SAFESEH is totally not neccassary for them:D
 
How /SAFESEH comes to work?
Hmm~~it’s quite easy, and it introduces nothing, just gethering all the addresses of those exception handler created for SEH, and put them into a table. OS will check if the address of a exception handler is within the boundary of the table when calling it. So if an arbitary address is specified by the malicious code, it will not come to work. OK, that’s it.
 
Just like /GS, /SAFESEH also need help from CRT. [We need to supply a load config struct (such as can be found in loadcfg.c CRT source file) that contains all the entries defined for Visual C++MSDN] After buiding the programe, use dumpbin /loadconfig <programe name>, we can get list of all exception handlers in the table:
 
Dump of file SafeSEH.exe
File Type: EXECUTABLE IMAGE
  Section contains the following load config:
            00000048 size
                   0 time date stamp
                0.00 Version
                   0 GlobalFlags Clear
                   0 GlobalFlags Set
                   0 Critical Section Default Timeout
                   0 Decommit Free Block Threshold
                   0 Decommit Total Free Threshold
            00000000 Lock Prefix Table
                   0 Maximum Allocation Size
                   0 Virtual Memory Threshold
                   0 Process Heap Flags
                   0 Process Affinity Mask
                   0 CSD Version
                0000 Reserved
            00000000 Edit list
            00407000 Security Cookie // for /GS
            004064F0 Safe Exception Handler Table
                   1 Safe Exception Handler Count
    Safe Exception Handler Table
          Address
          ——–
          00401100  __except_handler4
 
Actually, if we don’t define our own raw exception handler, there will always only one entry (__except_handler4) in the table, because, VC++ does the emcapsulation for us. In FS:[0] chain, all the _EXCEPTION_REGISTRATION struct contain the same handler, __except_handler4, which do the real job calling our code, usually defined in __except filter and __except block.

About Calling Convention

[A series of posts on the topics of Calling Convention, by RAYMOND CHEN – MSFT, The Old New Thing]
 
Those posts are tedious sufficiently, however, are still worthy of reading. You will get a rough idea about the road map of how calling convention is being developed, how calling convention is related to a specific architecture of CPU, how calling convention changes from 16-bit to 32-bit, and eventually to 64-bit architecure of CPU. You will also find some famous CPU listed in the posts, e.g., Power PC (PPC, which is a abandon-a-baby of Steve Jobs’s), Intel IA64 (Itanium), and AMD64 (AKA, a x86 architectured CPU with 64-bit extension).
 
[Discussed __cdecl (C Calling Convention), __pascal/__fortran, __fastcall]
 
[Discussed RISC-styled instruction set of CPU, Alpha AXP, MIPS R4000, Power PC]
Great tip: Curiously, it is only the 8086 and x86 platforms that have multiple calling conventions. All the others have only one!
 
[Discussed 32-bit version on the contrary of 16-bit in Part 1, and also thiscall (implicitly used when programming in C++)]
 
[Intel IA64]
 
[AMD 64]
 
Some terminologies in Part 4, 5 are a little difficult for me to understand, definitely, I lose most ideas on 64-bit world. But, I’m confident. Some day in the future, if I look back to this topic, I will have already been clear with them. haha~~~

Global Object Initializers

我们都知道,如果在一个.cpp文件中定义了一系列的全局对象,这些对象的初始化顺序(即,他们的.ctor的调用顺序)是和他们的定义顺序一致的,我们可以很容易的知道哪些对象先被实例化,从而不知不觉的依赖于这种顺序来编码。然而如果是在几个.cpp文件中定义全局对象,此时他们的初始化顺序呢?C++标准并没有对这个顺序作任何声明,只是简单要求全局对象必须在main函数进入之前初始化完毕。那诸如以下代码,我们如何保证他能够被正确执行呢?

Class A { // 不应该写出这样的类型,不尊从C++语义
public:
    A() { cout << "In A’s Constructor…" << endl; }
    ~A() { cout << "In A’s Destructor…" << endl; } };

A aaaa; // 这里定义了一个全局的aaaa对象

我们经常很随意的就写出了这样的一个类型,殊不知cout和aaaa的初始化顺序,会影响代码的正确执行。我们本能的做出这样的假设,为什么cout就先于aaaa被初始化呢?基于这种假设的实现依赖于编译器+连接器,不同的平台实现稍有差异,但是始终大同小异。详见《Inside the C++ Object Model》。

对于VC++的实现方式,是通过两点完成的。CRT Startup Code和预处理指令#pregma:

#pragma init_seg({ compiler | lib | user | "section-name" [, func-name]} )

[In MSDN: Specifies a keyword or code section that affects the order in which startup code is executed. ]

compiler/lib/user/section_name分别定义了初始化顺序的优先级别,compiler为最优先,一般为CRT库保留使用。我们可以在cout所在的.cpp文件中找到#pragma init_seg(compiler)的声明。CRT中的实现呢?

         /* crtexe.c
           * do C++ constructors (initializers) specific to this EXE
           */
           if (__native_startup_state == __initializing)
           {
               _initterm( __xc_a, __xc_z );
               __native_startup_state = __initialized;
           }

在__tmainCRTStartup的定义中可以发现如上的代码,他的工作就是遍历一个函数指针列表,依次进行调用。这些函数指针就指向了那个全局对象的构造函数,当然也有可能是其他的初始化用意的函数。那么这个函数指针列表是如何被构造出来的呢?这就要借助于连接器和编译器了。VC++编译器和CRT有一个约定:当VC++编译器遇到全局对象的初始化器以及内存释放工作(如:构造函数&析构函数),他就会产生一个dynamic initializer,并把他置于.obj文件的.CRT$XCU段中。.CRT是section的名称,$后面的XCU是group名称。(关于COFF或者PE的详细内容,请关注Matt大拿的Columns)。从init_seg的sample code所编译得到的代码中,使用dumpbin /all ctor.obj /out:d:\ctor.txt可以发现这个段中的内容如下:

SECTION HEADER #22
.CRT$XCU name
       0 physical address
       0 virtual address
       4 size of raw data
    21B8 file pointer to raw data (000021B8 to 000021BB)
    21BC file pointer to relocation table
       0 file pointer to line numbers
       1 number of relocations
       0 number of line numbers
40300040 flags
         Initialized Data
         4 byte align
         Read Only

RAW DATA #22
  00000000: 00 00 00 00                                      ….

RELOCATIONS #22
                                                Symbol    Symbol
Offset    Type              Applied To         Index     Name
——–  —————-  —————–  ——–  ——
00000000  DIR32                      00000000        42  ??__Eaaaa@@YAXXZ (void __cdecl `dynamic initializer for ‘aaaa”(void))

这就是一个编译器为aaaa产生的dynamic initializer,??__Eaaaa@@YAXXZ 是被name mangling后的名称(详见calling convention)。

另外编译器还产生了两个section,其中个包含了一个变量用作这个列表的哨兵:

__xc_a in .CRT$XCA
__xc_z in .CRT$XCZ //__xc_a和__xc_z就是刚才CRT Startup Code中传给_initterm的两个参数

OK。当连接器将各个同名的section进行归并,同时按照$后面的group name进行排序。那么将来被windows loader加载进内存后也会依旧保持这个顺序。那么我们的全局对象初始化器的dynamic initializer也会以这个顺序被调用。

_initterm 的汇编代码:

023C010  mov         edi,edi
1023C012  push        ebp 
1023C013  mov         ebp,esp
1023C015  mov         eax,dword ptr [ebp+8]
1023C018  cmp         eax,dword ptr [ebp+0Ch]
1023C01B  jae         __initterm+27h (1023C037h)
1023C01D  mov         ecx,dword ptr [ebp+8]
1023C020  cmp         dword ptr [ecx],0
1023C023  je          __initterm+1Ch (1023C02Ch)
1023C025  mov         edx,dword ptr [ebp+8]
1023C028  mov         eax,dword ptr [edx]
1023C02A  call        eax  // 在这里调用了每个dynamic initializer
1023C02C  mov         ecx,dword ptr [ebp+8]
1023C02F  add         ecx,4
1023C032  mov         dword ptr [ebp+8],ecx
1023C035  jmp         __initterm+5 (1023C015h)
1023C037  pop         ebp 
1023C038  ret             

dynamic initializer:

004146F0  push        ebp 
004146F1  mov         ebp,esp
004146F3  sub         esp,0C0h
004146F9  push        ebx 
004146FA  push        esi 
004146FB  push        edi 
004146FC  lea         edi,[ebp-0C0h]
00414702  mov         ecx,30h
00414707  mov         eax,0CCCCCCCCh
0041470C  rep stos    dword ptr es:[edi]
0041470E  mov         ecx,offset aaaa (419484h)
00414713  call        A::A (4110EBh)  // 真正调用了对象的构造函数
00414718  push        offset `dynamic atexit destructor for ‘aaaa” (415810h)
0041471D  call        @ILT+100(_atexit) (411069h)

00414722  add         esp,4
00414725  pop         edi 
00414726  pop         esi 
00414727  pop         ebx 
00414728  add         esp,0C0h
0041472E  cmp         ebp,esp
00414730  call        @ILT+340(__RTC_CheckEsp) (411159h)
00414735  mov         esp,ebp
00414737  pop         ebp 
00414738  ret 

同时注意以上的_atexit的调用。dynamic initializer为对象aaaa又注册了一个dynamic atexit destructor,用来在程序退出的时候由atexit调用以释放对象的内存。atexit:

int __cdecl atexit ( _PVFV func  ) {
        return (_onexit((_onexit_t)func) == NULL) ? -1 : 0; }

对于MSDN中的那段sample code还有一点需要说明:InitSegStart和InitSegEnd都是在这个哨兵section中的变量,在后面的代码中只是用到了他们的地址。他们就相当于上面的__xc_a和__xc_z

#pragma section(".mine$a", read)
__declspec(allocate(".mine$a")) const PF InitSegStart = (PF)1;

#pragma section(".mine$z",read)
__declspec(allocate(".mine$z")) const PF InitSegEnd = (PF)1;

小节

最近在重构ExUnitTest的代码,同时也在Review代码,发现了不少Singleton使用不当的地方。这些Singleton严重的基于了一定的初始化顺序的约束,所以每次我手工调换FreeInstance的位置时,进程就会Crash。当然这样的问题在任何C++项目中应该都是比较普遍的,在我们的项目中比较好的地方就是这些个类的实例化以及释放都被Facade封装了,我们只需调用expose出来的interface就可以,不用关心这些依赖性。

最后讲个额外的话题,boss要让我做ACRD Team这边的Build Engineer接手Chris Canndy一部分工作,感觉担子有点重,毕竟是把关的position。不过只要做两个月,之后会招一个intern,让intern来接管这个工作。由于最近公司预算的问题,intern的职位需要等到4月以后才能下来。这个position是build engineer & developer,需要较好的development skill,因为我们将来正在考虑实现自动编译部署,所以编程技能不能忽略。有机会看到我这个post的同僚可以发我简历,没看见的哥们就算你可惜了。哈哈~

Foreach in C++

今天继续研读Kenny的Post,收获颇丰,学到了一个比较smart的遍历写法,foreach in C++。
 
岔开一句,先提点无关紧要的,但是也是蛮有意思的东西:goto语句不到最后关头不要在C++或者C#这样的高级语言中使用,因为这会破环代码的结构。goto作为一种条件分支或者是跳转指令的用途,其实主要还是为底层的语言进行服务的。Why?高级语言中已经有了更佳的语法,如if、switch、for等等。像汇编这种语言是因为没有直接对这种高级语法的支持,所以需要通过goto来实现的(这里的goto泛指各种跳转指令和条件分支指令)。
 
要讲的是foreach。foreach是C#中的语法,用来遍历容器对象。它本质上功能和for一致,但是它的优势是我们不用去关心索引越界的问题,foreach语句本身会为我们实现“哨兵”的功能。所以这样的语法相当方便+安全,至少我写C#代码基本用foreach。然而今天让我很意外,也很激动,发现C++的STL中也提供了for_each,它是一个算法,它是一个Template(那是废话,STL嘛),通过简单的封装基本感觉和C#的foreach一致了。下面代码不是我写的,抄MSDN的。拿来晒晒…
// #include <algorithm>
class Average {
private:
   long num;      // The number of elements
   long sum;      // The sum of the elements
public:
   Average ( ) : num ( 0 ) , sum ( 0 ) { }
   void operator ( ) ( int elem ) {
      num++;      // Increment the element count
      sum += elem;   // Add the value to the partial sum
   }
   operator double ( ) {
      return  static_cast <double> (sum) /
      static_cast <double> (num);
   }
};
double avemod2 = for_each ( v1.begin ( ) , v1.end ( ), Average ( ) ); // begin and end are the first and the last sapareterly…
所有的对iterator返回的对象上的操作都被封装在了Average类中,这个类实现了函数对象(Function Object)的功能。与传统的得到iterator然后while遍历的写法是,for_each更面向对象,语法结构上更干净利落。所以,以后开发过程中,适时地可以抛弃过去老土的写法了。哈哈~~