During day-to-day work as a data scientist, you may frequently come across the issue where there is no easy functionality to schedule or automate custom report exports out of Salesforce for analysis or dashboard building. We will look at three methods of automated data retrieval and the pros and cons for each: 

  1. Python using ‘simple_salesforce’ package
  2. Salesforce REST API
  3. Salesforce email attachment scheduling (beta)

1. Python using ‘simple_salesforce’ package 

One easy solution we found is using the ‘simple_salesforce’ package in Python which allows users to download Salesforce reports directly into Python by making use of the ‘Salesforce’ function to log in programmatically and then using a GET function to retrieve the report. This can be best used when trying to access a pre-built custom report that needs to be exported periodically, perhaps for other reporting purposes. With this script the custom reports can then be pushed into a database or SFTP, etc. This solution works with both Salesforce Lightning and Classic.

For the following script to work you will need to obtain a few things:

1. Your Salesforce Username and Password

  • These are the regular account details you use to log into Salesforce.

2. Your Salesforce Security token

  • This can be found by logging into your Salesforce account and going to Settings > My Personal Information > Reset My Security Token. After following the instructions you should receive an email with your security token.

3. Your Salesforce Company ID

  • This can be found at the start of the URL in your Salesforce instance, e.g. for “https://lynchpin.my.salesforce.com/00O0X0000Aa4GO1” it is “https://lynchpin.my.salesforce.com/”.

4. Your Salesforce Report ID

  • This can be found by opening the report in Salesforce you wish to download and checking the URL. The report ID is the 15-18 character string near the end of the URL, e.g. for “https://lynchpin.my.salesforce.com/00O0X0000Aa4GO1” it is “00O0X0000Aa4GO1”.

Input your details into the following script and you will have your data!

from simple_salesforce import Salesforce
import requests
import pandas as pd
import csv
from io import StringIO
# Sign into Salesforce
sf = Salesforce(username='user@company.com', 
password='password',
security_token='token')
# Set report details
sf_org = 'https://company.salesforce.com/'
report_id = 'report_id'
export_params = '?isdtp=p1&export=1&enc=UTF-8&xf=csv'
# Download report
sf_report_url = sf_org + report_id + export_params
response = requests.get(sf_report_url, headers=sf.headers, cookies={'sid': sf.session_id})
new_report = response.content.decode('utf-8')
report_df = pd.read_csv(StringIO(new_report))

Pros

  • The package is simple to use and scale up further with more reports
  • Python has a low barrier to entry for analysts to access

Cons

  • ‘simple_salesforce’ is not officially supported by Salesforce so may be prone to break with any site updates

2. Salesforce REST API

Salesforce has its own SQL style API, the Salesforce Object Query Language (SOQL), for access to data with a lot of documentation online. It uses SQL based syntax for data retrieval from Salesforce objects. This is the officially supported method and the most reliable for data pipelines. The Python package ‘simple_salesforce’ also supports the SOQL method, for example the below query will pull all records from the Contact object where their last name is Mottershead and format the results into a data frame. This method is best used for creating new reports or bulk data retrieval.

# Query Salesforce
data = sf.query("SELECT Id, Email FROM Contact WHERE LastName = 'Mottershead'")
# Set up columns
id = []
email = []
for record in data['records']:
id.append(record['Id'])
email.append(record['Email'])
record_dict = {}
record_dict['Id'] = id
record_dict['Email'] = email
df = pd.DataFrame(record_dict)

Pros

  • Most stable method of connecting to and retrieving Salesforce report data
  • Detailed documentation on Salesforce website

Cons

  • Higher barrier to entry to get working and automate, especially if querying directly to REST API
  • SQL knowledge required

3. Salesforce email attachment scheduling (beta)

Salesforce has announced a beta version of attaching CSVs to email scheduling. Previously only a summary of the report was available and would be sent in the body of the email, so this is a big step forward! This method is best used for smaller data sets where reporting is required, such as new daily leads or upcoming expiring contracts for sales teams.

Pros

  • Very easy to distribute custom reports to users
  • Available through the Salesforce website without external tools

Cons

  • Data limitations are currently 15,000 rows and 30 columns which datasets can often exceed
  • Difficult to automate ingestion of data from email attachment into database
  • Email may not be a sufficiently secure transfer method, depending on the sensitivity of the underlying data

Hopefully you find these methods as useful as we did. It has saved us many hours of manual report-exporting and helped automate and streamline a lot of our BI processes!

Last reviewed: 25/08/2020