forum.vdsworld.com Forum Index forum.vdsworld.com
Visit VDSWORLD.com
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 


Help speeding up duplicate removal

 
Post new topic   Reply to topic    forum.vdsworld.com Forum Index -> General Help
View previous topic :: View next topic  
Author Message
DW
Contributor
Contributor


Joined: 21 Mar 2003
Posts: 175
Location: UK

PostPosted: Sun Sep 11, 2005 4:47 pm    Post subject: Help speeding up duplicate removal Reply with quote

The code I came up with is this

Code:

repeat
    if @greater(@count(2),0)
      list seek,2,0
    end
    if @match(2,@item(1))
      if @not(@equal(@item(1),@item(2),EXACT))
        list add,2,@item(1)
      end
    else
      list add,2,@item(1)
    end
    list delete,1
  until @equal(@count(1),0)


When this code is given a large list of over 10000 lines or so it take a long time to remove the dupes.
Is there a way to do it faster or a freeware tool out there that i can run hidden to do the same thing?

Thank you
Back to top
View user's profile Send private message
SnarlingSheep
Professional Member
Professional Member


Joined: 13 Mar 2001
Posts: 759
Location: Michigan

PostPosted: Sun Sep 11, 2005 5:07 pm    Post subject: Reply with quote

I might not be reading the code right, but it looks like list 2 ends up with everything list 1 has in it, except for the duplicates that are already in 2..
So couldn't you put list 1 and list 2 items into a 3rd list, and then use LIST SORT,3? Seems like it'd be the same end result to me.

_________________
-Sheep
My pockets hurt...
Back to top
View user's profile Send private message Send e-mail
DW
Contributor
Contributor


Joined: 21 Mar 2003
Posts: 175
Location: UK

PostPosted: Sun Sep 11, 2005 5:39 pm    Post subject: Reply with quote

What it should be doing is this.

list1 starts with all the data and list2 is empty.
I check the the current top item in list1 in not in list2.
If its not then I add it to list2, but if is does exist i just delete it from list1.

do you mean i can just put all my data in to one list and sort it to remove all the dupes?
does this bug/feature still exist?
Back to top
View user's profile Send private message
SnarlingSheep
Professional Member
Professional Member


Joined: 13 Mar 2001
Posts: 759
Location: Michigan

PostPosted: Mon Sep 12, 2005 1:37 am    Post subject: Reply with quote

Ah, yeah you should be able to use LIST SORT,1.
Give it a try and see if it works for what you want..

_________________
-Sheep
My pockets hurt...
Back to top
View user's profile Send private message Send e-mail
DW
Contributor
Contributor


Joined: 21 Mar 2003
Posts: 175
Location: UK

PostPosted: Mon Sep 12, 2005 6:49 am    Post subject: Reply with quote

I dont think the sort bug works anymore. I tried it as you said, but all it did was sort my list. Smile

So any other ideas?

Maybe an external dos duplicate checker of some sort? Does anyone know of one?

Any ideas welcome.
Back to top
View user's profile Send private message
Dr. Dread
Professional Member
Professional Member


Joined: 03 Aug 2001
Posts: 1065
Location: Copenhagen, Denmark

PostPosted: Mon Sep 12, 2005 2:13 pm    Post subject: Reply with quote

DW wrote:
I dont think the sort bug works anymore.


I think it does.... But you gotta create the list as sorted: LIST CREATE,1,SORTED

Greetz
Dr. Dread

_________________
~~ Alcohol and calculus don't mix... Don't drink and derive! ~~

String.DLL * advanced string processing
Back to top
View user's profile Send private message
Dr. Dread
Professional Member
Professional Member


Joined: 03 Aug 2001
Posts: 1065
Location: Copenhagen, Denmark

PostPosted: Mon Sep 12, 2005 2:16 pm    Post subject: Reply with quote

PS: Take care with large VDS lists - if you do a LIST LOADFILE, it will probably break around 100,000 items.
Perhaps use an @ok() check after the list load.

Greetz
Dr. Dread

_________________
~~ Alcohol and calculus don't mix... Don't drink and derive! ~~

String.DLL * advanced string processing
Back to top
View user's profile Send private message
DW
Contributor
Contributor


Joined: 21 Mar 2003
Posts: 175
Location: UK

PostPosted: Tue Sep 13, 2005 6:41 am    Post subject: Reply with quote

Thanks, will let you know what happened when I have had time to check it out.
Back to top
View user's profile Send private message
Serge
Professional Member
Professional Member


Joined: 04 Mar 2002
Posts: 1480
Location: Australia

PostPosted: Tue Sep 20, 2005 1:34 am    Post subject: Reply with quote

if i may clarify what was said above, if you copy a list into a list that was created with the SORTED option, then all duplicates are automatically deleted

you do not need to sift through the list item by item to check whether it is duplicated or not

this works with VDS 5 as i am writing a program now and use that feature to remove dupes

serge

_________________
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
Display posts from previous:   
Post new topic   Reply to topic    forum.vdsworld.com Forum Index -> General Help All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum

Twitter@vdsworld       RSS

Powered by phpBB © 2001, 2005 phpBB Group