Parquet and AVRO Format for AWS S3

Main advantages for customers:

  • Reduces IO operations.
  • Fetches specific columns that you need to access.
  • It consumes less space.
  • Support type-specific encoding.

Currently Syniti puts data in CSV. We've faced following issues because of this

  • extra columns in .mir file than ref. Reading data from text delimited files which do not support schema evolution becomes a tedious task.
  • Unrecognized characters in columns making parsing erroneous.
  • Adding of \r column in the column name itself because of text file creation and handling of such characters by different libraries.

Please authenticate to join the conversation.

Upvoters
Status

Future Consideration

Board

Syniti Knowledge Platform

Tags

Replicate

Date

Almost 3 years ago

Author

Emily Williams

Subscribe to post

Get notified by email when there are changes.