| View previous topic :: View next topic |
| Author |
Message |
marty Professional Member


Joined: 10 May 2001 Posts: 789
|
Posted: Mon Apr 24, 2006 2:42 pm Post subject: Ok seeking suggestions for URL filtering |
|
|
Ok guys I need advice and ideas over here...
Here is what I am trying to acheive and what I have done so far.
Application info - URL Filtering (300,000 urls)
Code done so far:
- VDS app loads the TXT that contains the 300,000 Urls into a internal list created by VDSLIST.DLL (more efficient than the original VDS one)
- VDS app monitors every 0.9 sec IE and Firefox for bad URLS
If IE used : I capture IE url first (from the comboEX class) than scan my internal list.. if found I send by DDE another webpage to block site.
If FireFox : I capture the URL using DDE, than check in the internal list.. if ound I send by DDE another webpage to block it
Now this seems the most efficient way I found so far.. BUT its slow on an older machine (PII 400).. it takes up to 10 seconds sometimes my app sends back the block page...
Here is what I tried also so far:
- Used String.dll instead of VDSList.dll. A bit slower
- Used the sqllite.. also slow..
Thanks in advance... |
|
| Back to top |
|
 |
Skit3000 Admin Team

Joined: 11 May 2002 Posts: 2166 Location: The Netherlands
|
|
| Back to top |
|
 |
marty Professional Member


Joined: 10 May 2001 Posts: 789
|
Posted: Mon Apr 24, 2006 3:48 pm Post subject: |
|
|
Yes thats an idea Prakash gave me this morning.. and I will certain see if I can do that.
Problem is the list is one big file and I would have to manually do that every time there is an update from the guys that maintain that list.
Good suggestion  |
|
| Back to top |
|
 |
Skit3000 Admin Team

Joined: 11 May 2002 Posts: 2166 Location: The Netherlands
|
Posted: Mon Apr 24, 2006 3:55 pm Post subject: |
|
|
Another way which might be a good solution, is to load the list into VDS (make sure it is sorted by the alphabet) and then create "indexes". Just create a second list, and put the 1000th, 2000th, 3000th, etc. item from the first one into the second. Now, when searching, look in the second list between which indexed items the current url belongs, and start searching from that item.
Edit: little example added
| Code: | # You 300.000 items list
list create,1,sorted
list add,1,111
list add,1,222
list add,1,333
list add,1,444
list add,1,555
list add,1,666
list add,1,777
# Index list
list create,2
%i = 0
%%steps = 3
while @greater(@count(1),%i)
list add,2,@item(1,%i)
%i = @sum(%i,%%steps)
wend
info Indexed items:@cr()@text(2)
# Locate where your item is in between in list 2
%%item = 555
%i = 0
# Adjust this while loop so that text can be compared
while @greater(%%item,@item(2,%i))
%i = @succ(%i)
wend
# Search the first list for your item from pos %i
%i = @prod(@pred(%i),%%steps)
list seek,1,%i
if @match(1,%%item)
info Found your item: @item(1) at line @index(1) of list 1
end |
Note that you should replace the "while @greater(%%item,@item(2,%i))" line with a string compare function, which checks if %%item is before or after @item(2,%i)...  _________________ [ Add autocomplete functionality to your VDS IDE windows! ]
Voor Nederlandse beginners met VDS: bekijk ook eens deze tutorial! |
|
| Back to top |
|
 |
marty Professional Member


Joined: 10 May 2001 Posts: 789
|
Posted: Mon Apr 24, 2006 5:55 pm Post subject: |
|
|
Thanks Skit Will look into that.
Will see if I can do a proxy with VDS.. the HTTPX extension is not stable.. so will look at other solution |
|
| Back to top |
|
 |
Serge Professional Member


Joined: 04 Mar 2002 Posts: 1480 Location: Australia
|
Posted: Tue Apr 25, 2006 3:19 am Post subject: |
|
|
marty,
i would use that idea
| Quote: | Yes thats an idea Prakash gave me this morning.. and I will certain see if I can do that.
Problem is the list is one big file and I would have to manually do that every time there is an update from the guys that maintain that list. |
1. i would write a little application that would sort the list you receive from your third party into alphabetic files ie. 'a.txt' for all url's that start with 'a', and so on... this would not be hard to do
2. i would build into your program a litte routine to check to see what letter the url starts with and then only process a check agains that file
now, given that lots of url's start with 'www.', i would totally remove it and work with the rest of the domain ... even from the third party file ... this would lead to a smaller file/list than otherwise eg. 'www.naughty.org' would become 'naughty.org' and so on...
just a thought
serge _________________
|
|
| Back to top |
|
 |
marty Professional Member


Joined: 10 May 2001 Posts: 789
|
Posted: Tue Apr 25, 2006 4:12 pm Post subject: |
|
|
Thanks Serge for the suggestions!  |
|
| Back to top |
|
 |
vdsalchemist Admin Team

Joined: 23 Oct 2001 Posts: 1448 Location: Florida, USA
|
Posted: Tue Apr 25, 2006 4:50 pm Post subject: |
|
|
Marty,
I agree with Serge's suggestions since I understand how the list commands and functions work. If the list has several values that have the same prefixes it would cause the list commands and functions longer to find the item of interest since they work off of binary trees. The prefixes cause the tree to be 1 sided since they use a string compare function that compare the string <, >, or = to the new string being added. If the string is < the previous string added the tree will grow to the right if the string is > the previous string the tree will grow to the left. If the string is equal most will return an error that is handled in different ways depending on the binary tree package that is being used and the programmer that is implementing the package.
With this said I am implementing a Dictionary command and function in GadgetX that will allow you to do really fast name = value look ups. I will send you a demo ASAP. _________________ Home of
Give VDS a new purpose!
 |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You can attach files in this forum You can download files in this forum
|
|