StackOverflow Data Dump Import v.11

  ClickOnce Installer: http://skysanders.net/tools/se/soddi/publish.htm

  (c) 2010 Sky Sanders
  licensed under MIT/GPL - see license.txt
  
  msi :http://skysanders.net/files/soddi.11.msi
  info:http://skysanders.net/tools/se
  bin :http://skysanders.net/files/soddi.11.zip
  src :http://bitbucket.org/bitpusher/soddi/


SODDI is a .Net 3.5 sp1 executable written in C# that quickly and cleanly imports StackOverflow Data Dump XML files into 
MS Sql Server 2000/05/08, MySql Server 5.1 and SQLite3. (MySql and SQLite drivers are included)

SODDI can be run as a command line utility or, when invoked with no arguments or GUI argument, will
present a Windows Form interface.

Quick Start:
The quickest route to your own copy of the StackOverflow databases is to use the ClickOnce installer,
browse to the uncompressed data dump, accept the default SQLite provider selection and click 'Import'.

USAGE:

soddi.exe source:"" target:"" [batch:5000] [split] [indices] [fulltext] [[meta] [so] [su] [sf]] [gui]


SOURCE          The directory containing the individual site directories.
                NOTE: do not include trailing slash in quoted path as the arg
                parser will interpret it as an escaped quote and puke.

TARGET          A valid ADO.Net connection string, including the provider invariant
                name.
                
                Platform specific connection string details:
                
                Sql Server: Database must exist. Data will be loaded into tables segregated by
                schema named as the site data being imported. e.g. so.Users, meta.Users.
                The tables are dropped before import.
                
                MySql: Connection string should include server, each site's data will be loaded
                into a database named as the site imported. The databases will be dropped and 
                recreated before import.
                
                SQLite: Connection string should specify a directory. The data will be imported
                into seperate .db3 files, each named as the site imported. Existing data files
                will be overwritten.
                
                The target database/datafile/schema names can be modified by explicitely specifying
                sites to import and appending the desired schema as a parameter value or editing
                the Sites list item schema in the GUI.

                
-- OPTIONAL ARGUMENTS

SPLIT           Normalize post tags by splitting the concatenated Posts.Tags field into individual 
				rows in a separate PostTags table.

INDICES         Enables useful indexes on each table.

FULLTEXT        Enables a full text index on Posts.Body and Posts.Title - SqlServer only.

BATCH           Number of rows inserted in each transaction. Default 5000.

GUI             Presents a Windows Forms interface. If SODDI is invoked with arguments and GUI, the UI
                will be populated with the supplied arguments.
                
                The console window will remain open to recieve all debug and error output.

META|SO|SU|SF   Specifies which sites to import. If none are specified, all site directories found in 
                SOURCE will be imported.
                
                To specify a different target name simply treat the site name as a parameter.
                
                e.g. 
                
                Sql Server - SO:StackOverflowData will load the data from the XXXXX SO directory 
                into the database specified in the connection string and the schema 'StackOverflowData'
                
                MySql - SO:StackOverflowData will load the data from the XXXXX SO directory 
                into a new database named StackOverflowData on the server specified in the connection string.
                
                SQLite - SO:StackOverflowData will load the data from the XXXXX SO directory into a new 
                db3 file named StackOverflowData.db3 in the directory specified in the connection string.
                
                In GUI mode you may edit the schema item in the Sites list.
                
Options are not case sensitive.

Example command lines.

GUI Mode:
	soddi 

SQLite - all sites:
	soddi source:"F:\Export-030110" target:"data source=c:\temp;version=3;new=True;Provider=System.Data.SQLite"

MySql - all sites:
	soddi source:"F:\Export-030110" target:"server=localhost;user id=root;password=p@ssW0rd;Provider=MySql.Data.MySqlClient"

MySql - Meta StackOverflow and StackOverflow data into specified databases:
	soddi source:"F:\Export-030110" target:"server=localhost;user id=root;password=p@ssW0rd;Provider=MySql.Data.MySqlClient" meta:MetaDb so:SoDb
	
Sql Server - all sites:
	soddi source:"F:\Export-030110" target:"data source=(local);initial catalog=SOData;integrated security=true;Provider=System.Data.SqlClient"
	
Sql Server - StackOverflow data only (SO):
	soddi source:"F:\Export-030110" target:"data source=(local);initial catalog=SOData;integrated security=true;Provider=System.Data.SqlClient" so

Sql Server - StackOverflow data only into schema dbo:
	soddi source:"F:\Export-030110" target:"data source=(local);initial catalog=SOData;integrated security=true;Provider=System.Data.SqlClient" so:dbo


Sql Server - StackOverflow data only, split tags and add indices:
	soddi source:"F:\Export-030110" target:"data source=(local);initial catalog=SOData;integrated security=true;Provider=System.Data.SqlClient" so split indices

 
The latest data dump can be found at
http://blog.stackoverflow.com/category/cc-wiki-dump/


04/01/2010 - Sky Sanders

04/09/2010 - Explicitly set platform to x86 to allow same binaries to run on x64.