Today I'm writing about using
Talend Open Studio to sync Salesforce data to MySQL or Postgresql databases.
First of all, let's download and install
Talend Open Studio for Data Integration, make sure you already have Java installed.
Next is unzip and run TOS_DI-macosx-cocoa as I'm using a Mac OS X, or TOS_DI-win-x86_64.exe if you're using Windows.
It's based on Eclipse IDE so you'll see it's pretty easy if you used to work with Eclipse.
Here is the main screen.
Then we will need to create a database connection by expanding the
Metadata node ->
Db connections -> Right click ->
Create connection -> enter a Name (without space) -> click Next button
Then select
Db Type (MySQL or Postgresql or whatever in the long list). I'm choosing Postgresql because MySQL limits row size to 65535. So if the table contains too much fields, it will throw out an error:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have to change some columns to TEXT or BLOBs
After enter all connection information, click Check button to make sure they are correct. Then Finish.
Next step is creating Salesforce connection. Let's right click the Salesforce node and select
Create Salesforce connection. Enter a name then Next.
Enter the User name and Password.
The Password is combination of Salesforce password and the token. i.e if the Salesforce password is xyz and the token is 1234, then we enter xyz1234 on the Password field. Then Check Login to make sure connection is valid. Then Finish.
Next step is retrieving Salesforce modules by right click on created Salesforce connection and select
Retrieve Salesforce modules. Let's check for
Account
Then we will need to create a job to import Salesforce data into the database. Right click on
Job Design node, select
Create Jobs
Then job design screen is opened and we will expand the Salesforce
Account module to drag/drop the
Account table into the job design screen as
tSalesforceInput component.
Click OK then we have it on the screen
Next step is search for database output component on the
Palette box, then drag/drop
tPostgresqlOutput component into the screen
Then we might need to rename it as "
account" and click the inside icon to set its properties
Select Repository on Property Type , then select created Postgresql connection -
DB (POSTGRESQL):PosgressqlSalesforce
All connection information automatically filled out. Except Table - enter
"account" here, then select
Create table if does not exist for
Action for table, and
Insert for
Action for data
Next step is linking between Salesforce input and Postgresql output by right click on Salesforce input's icon and select
Row ->
Main, then wire the connection to the Postgresql output's icon
and
Run the job.
Then let's check the Postgresql to see the new account table is created with data from Salesforce
Then double the Postgrsql account output to edit its properties, select
Insert or Update for
Action on data and click
Edit schema button, then select the checkbox
Key on the
Id column of account Output
Then we might run the job again.
We might repeat those steps to sync other tables.
Now if we want to execute the job automatically by schedule / cron, we will need to build the job.
Right click on the job node, select
Build Job
Then unzip the file, explore to the sub folder sync, we will see
sync_run.sh or
sync_run.bat (or jobName_run.bat/.sh) that will be called in schedule / cron.
So we are done!
FYI, here is the
videocast to show you how to sync from database to Salesforce.