Problems faced till now
Here is a list of problems that I have faced while writing code for my thesis. Not a complete, but all that I can remember at the moment. Hopefully this will help me later (and it should)
1. Since I am working at PRE_ROUTING hook, I can read all the fields of iphdr. However the tcphdr has not been populated. So I used pointer manipulation for getting the tcphdr. This works fine. However when catching packets at POST_ROUTING, that is on the way out, all the headers, except MAC, are populated. However at times I used to get junk values whenever I tried using values from tcphdr directly.
This was solved by linearizing the sk_buff before manipulating it. So now I always do skb_linearize(skb, GFP_ATOMIC) at the start of the hook function.
2. At the PRE_ROUTING hook, I was not able to access the MAC header directly using skb->mac.eth. I could get the protocol value but not the hardware addr. I am not sure why this method didnt work.
This was solved by doing the following
struct ethhdr *mac = (struct ethhdr *)skb->mac.raw;
The idea being that one needs to typecast the memory properly. Now I can access the hardware source and dest addr from the variable mac. Easy right :->
3. I use skb_copy() to make a copy of the sk_buff. skb_copy() calls skb_linearize(), so there is no need to explicitly put a call for skb_linearize() in your code.
4. I got the following error while hacking the kernel
KERNEL: assertion (atomic_read(&skb->users) == 0) failed at dev.c
So what does this mean. Lets take a step back in order to understand this. sk_buff has a variable called 'users'. This variable stores the count of the number users (processes) that are accessing it. Functions like skb_get, _clone, _copy etc increment this variable. So this error says 'I am trying to read an sk_buff but it seems no one is using it i.e. the sk_buff in question has been freed'.
Why did this happen ? In my code, I forgot a return statement. After I called ip_finish_output() for putting the packet on the wire, I forgot the return. So it went into my error control messages and called kfree_skb(). Obviously the sk_buff is not there and so the error. Lesson learned: Be more careful while writing and have lots of patience.
5. How can one enable SLAB_DEBUGGING ?
Set CONFIG_DEBUG_SLAB = Y in your .config file
Then run 'make oldconfig' and recompile the kernel
If you try to write to a previously freed slab, the kernel will complain immediately. I was about to use this for checking why I was getting he error I discuss in 4.
I had a few interesting issues while creating a SYN for the connection (HB)->(HB, dest).
6. I make a copy of the original packet in my linked list called forward-queue (fwd-q). In the function that creates the SYN for fwd connection, I first make a copy of the sk_buff at the head of fwd-q and then make changes in that header. When I was trying to access the tcphdr values, I was getting junk.
What was I doing wrong ? I was copying the packet into the fwd-q before I pulled the tcphdr using pskb_may_pull. Because of this tcphdr was not populated and hence I could not access the fields directly.
7. I was trying to calculate tcp checksum by calling tcp_v4_check(). However I was getting wired results on the wire like fragmented packets, wrong checksum etc.
The solution was to set the checksum field of the header to zero before actually recalculating the checksum. I dont understand very clearly as to why this is required.
Another thing was that tcp_v4_check() makes use to skb->len. However I was not sure what value it contains in the copy. So I calculated length that tcp_v4_check() would need using iph->tot_len and iph->ihl. Remember tot_len is u16 and not u32, so use ntohs(). Sick error, but I made it. However, now that I think about it, skb->len should contain the proper value because I copy the packet after pulling the tcphdr. So len should be equal to data bytes.
8. Big problem. There are times when the code works like a gem and then there are times when it shows me Fragmented IP packet on the wire. Right now I have no clue why this happens. But need to get this fixed, else everything else is a waste.
1. Since I am working at PRE_ROUTING hook, I can read all the fields of iphdr. However the tcphdr has not been populated. So I used pointer manipulation for getting the tcphdr. This works fine. However when catching packets at POST_ROUTING, that is on the way out, all the headers, except MAC, are populated. However at times I used to get junk values whenever I tried using values from tcphdr directly.
This was solved by linearizing the sk_buff before manipulating it. So now I always do skb_linearize(skb, GFP_ATOMIC) at the start of the hook function.
2. At the PRE_ROUTING hook, I was not able to access the MAC header directly using skb->mac.eth. I could get the protocol value but not the hardware addr. I am not sure why this method didnt work.
This was solved by doing the following
struct ethhdr *mac = (struct ethhdr *)skb->mac.raw;
The idea being that one needs to typecast the memory properly. Now I can access the hardware source and dest addr from the variable mac. Easy right :->
3. I use skb_copy() to make a copy of the sk_buff. skb_copy() calls skb_linearize(), so there is no need to explicitly put a call for skb_linearize() in your code.
4. I got the following error while hacking the kernel
KERNEL: assertion (atomic_read(&skb->users) == 0) failed at dev.c
So what does this mean. Lets take a step back in order to understand this. sk_buff has a variable called 'users'. This variable stores the count of the number users (processes) that are accessing it. Functions like skb_get, _clone, _copy etc increment this variable. So this error says 'I am trying to read an sk_buff but it seems no one is using it i.e. the sk_buff in question has been freed'.
Why did this happen ? In my code, I forgot a return statement. After I called ip_finish_output() for putting the packet on the wire, I forgot the return. So it went into my error control messages and called kfree_skb(). Obviously the sk_buff is not there and so the error. Lesson learned: Be more careful while writing and have lots of patience.
5. How can one enable SLAB_DEBUGGING ?
Set CONFIG_DEBUG_SLAB = Y in your .config file
Then run 'make oldconfig' and recompile the kernel
If you try to write to a previously freed slab, the kernel will complain immediately. I was about to use this for checking why I was getting he error I discuss in 4.
I had a few interesting issues while creating a SYN for the connection (HB)->(HB, dest).
6. I make a copy of the original packet in my linked list called forward-queue (fwd-q). In the function that creates the SYN for fwd connection, I first make a copy of the sk_buff at the head of fwd-q and then make changes in that header. When I was trying to access the tcphdr values, I was getting junk.
What was I doing wrong ? I was copying the packet into the fwd-q before I pulled the tcphdr using pskb_may_pull. Because of this tcphdr was not populated and hence I could not access the fields directly.
7. I was trying to calculate tcp checksum by calling tcp_v4_check(). However I was getting wired results on the wire like fragmented packets, wrong checksum etc.
The solution was to set the checksum field of the header to zero before actually recalculating the checksum. I dont understand very clearly as to why this is required.
Another thing was that tcp_v4_check() makes use to skb->len. However I was not sure what value it contains in the copy. So I calculated length that tcp_v4_check() would need using iph->tot_len and iph->ihl. Remember tot_len is u16 and not u32, so use ntohs(). Sick error, but I made it. However, now that I think about it, skb->len should contain the proper value because I copy the packet after pulling the tcphdr. So len should be equal to data bytes.
8. Big problem. There are times when the code works like a gem and then there are times when it shows me Fragmented IP packet on the wire. Right now I have no clue why this happens. But need to get this fixed, else everything else is a waste.
0 Comments:
Post a Comment
<< Home