Archiving Framework

Overview

The implementation consists of some Node.js scripts intended to migrate historical data from a source MSSQL Server to a target MongoDb database. The scripts would be invoked from a Jenkins job which can be triggered manually or based on a schedule.

Getting started

  • Reach out to the DevOps team to create a MongoDb credential and database.

  • Reach out to the Security team to permit outbound connection from Jenkins server to your MSSSQL server on (http://10.128.31.193:5050/).

    • Clone the repository on Bitbucket by running the following command:

      $ git clone https://bitbucket.org/vggdev/archivingframework/src/master/

$ git checkout development

  • Open config.json file. This is where the configurations related to the database tables to archive are defined. A typical structure of this file is shown below:

    {
      "sourceDatabaseName": "IdentityServer",
      "targetDatabaseName": "IdentityServer_Archive",
      "tablesDefinition": [{
              "tableName": "LoginRecords",
              "timestampColumn": "Timestamp",
              "selectQuery": "SELECT * FROM [IdentityServer].[dbo].[LoginRecords] WHERE [TimeStamp] between @minDate and @maxDate",
              "deleteQuery": ["DECLARE @Deleted_Rows INT; SET @Deleted_Rows = 1; WHILE (@Deleted_Rows > 0) BEGIN BEGIN TRANSACTION DELETE TOP (1000) [IdentityServer].[dbo].[LoginRecords] WHERE [TimeStamp] between @minDate and @maxDate SET @Deleted_Rows = @@ROWCOUNT; COMMIT TRANSACTION CHECKPOINT END"],
              "minDateQuery": "SELECT min(Timestamp) as minDate FROM [IdentityServer].[dbo].[LoginRecords]",
              "monthsFromMinDate": 1
          },
          {
              "tableName": "Audit",
              "timestampColumn": "TimeStamp",
              "selectQuery": "select * FROM [IdentityServer].[dbo].[Audit] WHERE [TimeStamp] between @minDate and @maxDate",
              "deleteQuery": ["DECLARE @Deleted_Rows INT; SET @Deleted_Rows = 1; WHILE (@Deleted_Rows > 0) BEGIN BEGIN TRANSACTION DELETE TOP (1000) [IdentityServer].[dbo].[Audit] WHERE [TimeStamp] between @minDate and @maxDate SET @Deleted_Rows = @@ROWCOUNT; COMMIT TRANSACTION CHECKPOINT END"],
              "minDateQuery": "SELECT min(Timestamp) as minDate FROM [IdentityServer].[dbo].[Audit]",
              "monthsFromMinDate": 1
          }
      ]
    }

    Property

    Description

    tablesDefinition

    This is where definitions related to the tables being archived are configured

    tableName

    The name of the table as it'll appear on the destination database

    timestampColumn

    It is very important to add the SQL timestamp column on the table. This convention ensures that archiving framework splits the records into the appropriate year, i.e. Audit_2019, Audit_2020

    selectQuery

    Write the actual select query that extracts data from SQL. The query could be as complex as joining multiple tables into a flat structure. Whatever the result of the flattened data is, ensure that timeStampColumn value equals the value configured above.

    deleteQuery

    This is an array of queries. This gives you the flexibility of deleting records from multiple tables especially in cases where multiple tables are joined to achieve a flattened structure. Note that the delete queries are parameterized with @minDate and @maxDate. By convention, the framework substitutes the values of the minDateQuery and monthsFromMinDate to the parameters respectively.

    minDateQuery

    This query should only return the minimum date on a particular table being archived. The alias of the column being returned by convention must be minDate. This value is used to substitute the @minDate parameter as described above.

    monthsFromMinDate

    This value is used to determine how many months from the @minDate to archive. After getting the @minDate value, the framework adds the number of months specified in order to get an absolute upper and lower date boundaries.

  • Open the Jenkinsfile . This is where the job's pipeline related configuration are defined. The structure of the file is shown below:

    def archiveId = UUID.randomUUID().toString()
    pipeline {
    agent any
    tools {
      nodejs "Nodejs"
    }
    environment {
      ARCHIVE_ID = "${archiveId}"
      SQL_CONNECTION_STRING = credentials('sso_sql_prod_connection_string')
      MONGODB_CONNECTION_STRING = credentials('sso_mongo_prod_connection_string')
    }
    stages {
      stage('Setup') {
        steps {
          sh 'npm install'
        }
      }
      stage('Initialize') {
        steps {
          sh 'node autoArchive.js'
        }
      }
    }
    }

    Property

    Description

    SQL_CONNECTION_STRING

    The connection string is stored as a secret in Jenkins using the secret kind of text. This is in turn injected to the job as an environment variable. The value of this variable is the ID of the credential i.e sso_sql_prod_connection_string as shown in the snippet above.

    MONGODB_CONNECTION_STRING

    The destination Mongodb connection string is stored as a secret in Jenkins using the secret kind of text. This is in turn injected to the job as an environment variable. The value of this variable is the ID of the credential

Changes made to these files should be committed to a different branch.

  • Clone the job from the Jenkins server (http://10.128.31.193:5050/)

  • Goto the job configuration page

  • Scroll down to the repository section and update the repository URL to the one you cloned earlier.

  • Change branch specifier to the new branch.

  • Save the job

  • Go back to the job page and build.

  • Job may take a while to complete.

  • In case of any errors while the job is executing, simply build the job again. The framework is designed to automatically resume the last failed job on every execution.

  • Reach out to the R&D team for further enquiries.

Last updated

Was this helpful?