Skip to main content

Command Palette

Search for a command to run...

Designing Export Profiles for Different Data Workflows

Updated
5 min read
Designing Export Profiles for Different Data Workflows

A CSV is a CSV, right?

Not when your user is a data analyst in Germany opening the file in Excel. Or a Python developer loading it into Pandas. Or a data engineer piping it into BigQuery.

Each workflow has different expectations about:

  • Decimal separators (dot vs. comma)

  • CSV delimiters (comma vs. semicolon)

  • Null representation (empty string vs. "N/A" vs. actual null)

  • Date formats (ISO vs. localized)

  • Column naming conventions

When I built HTML Table Exporter, I originally had one "export to CSV" button. Users quickly taught me that wasn't enough.

The Problem: Regional and Tool-Specific Expectations

European Excel Users

In Germany, France, Spain, and most of Latin America:

  • Decimal separator: comma (1.234,56)

  • Thousands separator: dot (1.234)

  • CSV delimiter: semicolon (because comma is used for decimals)

If you export a CSV with comma delimiters containing European-formatted numbers, Excel interprets "1,234" as two columns.

US/UK Excel Users

  • Decimal separator: dot (1,234.56)

  • Thousands separator: comma (1,234)

  • CSV delimiter: comma

Same data, different formatting expectations.

Python/Pandas Users

Pandas read_csv() expects:

  • Standard decimal format (dot)

  • Consistent null representation (empty or specific string)

  • Clean column names (no spaces, lowercase preferred)

A CSV with "Revenue (€)" as a column header becomes df['Revenue (€)'] — awkward to work with.

SQL/Database Users

DuckDB, PostgreSQL, BigQuery expect:

  • SQL-safe identifiers (no spaces, no special characters)

  • Proper NULL handling

  • Type-inferrable data (numbers should look like numbers)

The Solution: Preset Profiles

Rather than exposing 20 checkboxes for configuration, I created preset profiles that bundle sensible defaults for each use case.

Profile: Default

{
  cleaning: {
    trimStrings: true,
    normalizeNumbers: false,
    normalizeDates: false,
    nullValues: [],
  },
  export: {
    csv: { delimiter: "," },
    naming: { case: "snake" }
  }
}

Minimal transformation. Exports data as-is with basic cleanup.

Profile: For Pandas

{
  cleaning: {
    trimStrings: true,
    normalizeNumbers: true,
    normalizeDates: true,
    nullValues: ["", "—", "N/A", "NA", "null", "None", "-"],
    booleans: {
      true: ["yes", "sí", "true", "1", "y"],
      false: ["no", "false", "0", "n"]
    }
  },
  export: {
    formats: ["csv", "ndjson"],
    csv: { delimiter: "," },
    json: { nativeTypes: true },
    naming: { case: "snake" }
  }
}

Key decisions:

  • Normalize numbers — Convert "1.234,56" to "1234.56" so Pandas parses correctly

  • Normalize dates — Convert to ISO format for pd.to_datetime()

  • Map nulls — Convert common null representations to empty (which Pandas handles)

  • Map booleans — Convert "Yes"/"No" to "true"/"false"

  • Snake case namesrevenue_eur instead of "Revenue (€)"

Profile: Excel (European)

{
  cleaning: {
    trimStrings: true,
    normalizeNumbers: true,
    normalizeDates: true,
    nullValues: ["", "—", "-"]
  },
  export: {
    formats: ["csv", "xlsx"],
    csv: { delimiter: ";" },  // Semicolon!
    naming: { case: "original" }
  }
}

Key decisions:

  • Semicolon delimiter — Essential for European Excel

  • Original naming — Business users expect readable headers

  • Normalize numbers — But Excel will display in regional format

Profile: For DuckDB

{
  cleaning: {
    trimStrings: true,
    normalizeNumbers: true,
    normalizeDates: true,
    nullValues: ["", "NULL", "null", "N/A", "-"],
    booleans: {
      true: ["true", "1", "yes", "y"],
      false: ["false", "0", "no", "n"]
    }
  },
  export: {
    formats: ["csv", "sql"],
    naming: { case: "snake" }
  }
}

Key decisions:

  • SQL-safe identifiers — No spaces, lowercase

  • Boolean normalization — SQL expects true/false

  • Aggressive null mapping — Consistent NULL handling in queries

Profile: For BigQuery

{
  cleaning: {
    normalizeNumbers: true,
    normalizeDates: true,
    nullValues: ["", "NULL", "null", "N/A", "NA", "#N/A", "-"]
  },
  export: {
    formats: ["csv", "ndjson"],
    json: { nativeTypes: true },  // Critical!
    bulk: { format: "ndjson" }
  }
}

Key decisions:

  • NDJSON as default — BigQuery loads NDJSON efficiently

  • Native JSON types — Numbers as numbers, not strings. BigQuery infers schema from values.

  • Comprehensive null mapping — Excel's "#N/A" should become null

Type Rules: Pattern-Based Inference

Some columns need specific treatment regardless of their values. The profiles include type rules based on column name patterns:

typeRules: [
  { pattern: "*_id", type: "string" },    // IDs stay strings
  { pattern: "*_date", type: "date" },    // Dates get normalized
  { pattern: "*_at", type: "date" },      // created_at, updated_at
  { pattern: "price*", type: "number" },  // price_usd, price_eur
  { pattern: "is_*", type: "boolean" },   // is_active, is_verified
  { pattern: "has_*", type: "boolean" }   // has_subscription
]

If a column is named customer_id, it stays as a string even if all values are numeric. Because joining on "123" vs 123 causes subtle bugs.

The UX Challenge: Not Overwhelming Users

Six profiles with different settings could be confusing. The design decisions:

  1. Descriptive names — "For Pandas" not "Profile A"

  2. Visible description — "Optimized for Python/Pandas analysis"

  3. Smart defaults — "Default" selected unless user changes it

  4. Persistent selection — Remember the last-used profile

Most users pick a profile once and forget about it. Power users can create custom profiles with their own settings.

What I Learned

Users Know Their Workflow

Nobody asked for "semicolon delimiter support." They asked "why doesn't this work in Excel?" Understanding the underlying workflow (European Excel) led to the right solution (regional profile).

Defaults Matter More Than Options

The "For Pandas" profile could expose 15 individual settings. Instead, it's one click. Users trust that I've made reasonable choices for their use case.

Edge Cases Define Quality

The difference between "works" and "works well" is handling:

  • "N/A" vs "NA" vs "#N/A" (all mean null)

  • "Yes" vs "yes" vs "YES" (all mean true)

  • "1,234" vs "1.234" (regional number formats)

Each edge case handled is one less support question.


Want to try these profiles? Check out HTML Table Exporter PRO on the Chrome Web Store.