IT World: September 2014

Sunday, September 21, 2014

Auto syncronize Salesforce data to MySQL or Postgresql with Talend Open Studio

Today I'm writing about using Talend Open Studio to sync Salesforce data to MySQL or Postgresql databases.

First of all, let's download and install Talend Open Studio for Data Integration, make sure you already have Java installed.

Next is unzip and run TOS_DI-macosx-cocoa as I'm using a Mac OS X, or TOS_DI-win-x86_64.exe if you're using Windows.
It's based on Eclipse IDE so you'll see it's pretty easy if you used to work with Eclipse.

Here is the main screen.

Then we will need to create a database connection by expanding the Metadata node -> Db connections -> Right click -> Create connection -> enter a Name (without space) -> click Next button

Then select Db Type (MySQL or Postgresql or whatever in the long list). I'm choosing Postgresql because MySQL limits row size to 65535. So if the table contains too much fields, it will throw out an error: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have to change some columns to TEXT or BLOBs

After enter all connection information, click Check button to make sure they are correct. Then Finish.

Next step is creating Salesforce connection. Let's right click the Salesforce node and select Create Salesforce connection. Enter a name then Next.

Enter the User name and Password. The Password is combination of Salesforce password and the token. i.e if the Salesforce password is xyz and the token is 1234, then we enter xyz1234 on the Password field. Then Check Login to make sure connection is valid. Then Finish.

Next step is retrieving Salesforce modules by right click on created Salesforce connection and select Retrieve Salesforce modules. Let's check for Account

Then we will need to create a job to import Salesforce data into the database. Right click on Job Design node, select Create Jobs

Then job design screen is opened and we will expand the Salesforce Account module to drag/drop the Account table into the job design screen as tSalesforceInput component.

Click OK then we have it on the screen

Next step is search for database output component on the Palette box, then drag/drop tPostgresqlOutput component into the screen

Then we might need to rename it as "account" and click the inside icon to set its properties
Select Repository on Property Type , then select created Postgresql connection - DB (POSTGRESQL):PosgressqlSalesforce
All connection information automatically filled out. Except Table - enter "account" here, then select Create table if does not exist for Action for table, and Insert for Action for data

Next step is linking between Salesforce input and Postgresql output by right click on Salesforce input's icon and select Row -> Main, then wire the connection to the Postgresql output's icon
and Run the job.

Then let's check the Postgresql to see the new account table is created with data from Salesforce
Then double the Postgrsql account output to edit its properties, select Insert or Update for Action on data and click Edit schema button, then select the checkbox Key on the Id column of account Output

Then we might run the job again.
We might repeat those steps to sync other tables.

Now if we want to execute the job automatically by schedule / cron, we will need to build the job.
Right click on the job node, select Build Job

Then unzip the file, explore to the sub folder sync, we will see sync_run.sh or sync_run.bat (or jobName_run.bat/.sh) that will be called in schedule / cron.

So we are done!

FYI, here is the videocast to show you how to sync from database to Salesforce.

Saturday, September 13, 2014

Automate Rackspace Cloud Block Storage volume creation with Ansible

Today I want to automate Rackspace Cloud Block Storage volume creation with Ansible for Rackspace instances.

To do that, we will need rax modules that requires to install pyrax.

After pyrax is installed, then I tested to create a block storage volume with the module rax_cbs
and that thrown out an error:
msg: pyrax is required for this module

That's interesting. I checked the python path and tested pyrax:

$ which python
/usr/local/bin/python

$ python
Python 2.7.7 (default, Jun 18 2014, 16:33:32)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.9)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyrax
>>>

Checked the ansible python path
$ head /usr/local/bin/ansible
#!/usr/local/opt/python/bin/python2.7

Found both python and /usr/local/opt/python/bin/python2.7 are symbolic links of /usr/local/Cellar/python/2.7.7_2/Frameworks/Python.framework/Versions/2.7/bin/python2.7.

Google around but can't find the answer.

Finally verified the module rax_cbs:
$ head /usr/share/ansible/cloud/rax_cbs
#!/usr/bin/python

Then:
$ /usr/bin/python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyrax
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named pyrax

That's why it didn't work! /usr/bin/python is built-in python on Mac OS X while 2.7.7 was installed via Homebrew to different path.

So let's set the ansible_python_interpreter for localhost that is used by rax modules in the ansible/hosts file:
[local]
localhost ansible_connection=local ansible_python_interpreter=/usr/local/opt/python/bin/python2.7

But then got another error:
msg: No CloudBlockStorageVolume matching

Found the answer to patch the /usr/share/ansible/cloud/rax_cbs

Voila!

I also want to attach the volume to the instance, then fdisk/create partition/format/mount it automatically.., so we have to check if the device already exist or not, if not then create/attach a Cloud Block Storage volume and fdisk/create partition/format..

Below is the playbook.

Note that this playbook will be executed for remote Rackspace instances so we don't need to set hosts/connection for Rackspace build/attach block storage volume tasks as the example when the local_action module already delegates to the localhost defined in ansible/hosts automatically.

 # file:Rackspace_disk/tasks/main.yml  
 - name: Check if device /dev/xvd_ present  
  shell: fdisk -l | grep 'Disk {{device}}' | wc -l  
  changed_when: False  
  register: device_present  
 - name: Build a Block Storage Volume  
  local_action:  
   module: rax_cbs  
   credentials: "{{credentials}}"  
   name: "{{volum_name}}"  
   volume_type: "{{volum_type}}"  
   size: "{{volum_size}}"  
   region: "{{region}}"   
   wait: yes  
   state: present  
   meta:  
    app: "{{volum_name}}"  
  sudo: no  
  when: device_present.stdout is defined and device_present.stdout|int == 0  
 - name: Attach a Block Storage Volume  
  local_action:  
   module: rax_cbs_attachments  
   credentials: "{{credentials}}"  
   volume: "{{volum_name}}"  
   server: "{{server}}"  
   device: "{{device}}" #/dev/xvd_  
   region: "{{region}}"  
   wait: yes  
   state: present  
  sudo: no  
  when: device_present.stdout is defined and device_present.stdout|int == 0  
 - name: Check if partition /dev/xvd_1 present  
  shell: fdisk -l | grep {{device}}1 | wc -l  
  changed_when: False  
  register: partition_present  
 - name: Fdisk / create partition / format  
  shell: "echo -e 'n\np\n1\n\n\nw\n' | fdisk {{device}} && mkfs -t {{fstype}} {{device}}1 && tune2fs -m 0 {{device}}1 "  
  when: partition_present.stdout is defined and partition_present.stdout|int == 0  
 - file: path={{mount_dir}} state=directory  
 - name: Mount device  
  mount: name={{mount_dir}} src={{device}}1 fstype={{fstype}} opts='defaults,noatime,nofail' state=mounted