# E-Commerce Orders Test Dataset

This dataset demonstrates Datally's data consolidation capabilities using a realistic e-commerce scenario with orders from multiple sales channels.

## 📦 What's Included

This package contains 4 CSV files:

1. **_DICTIONARY_order_system.csv** - The reference schema (upload this FIRST)
   - 35 complete, clean order records
   - Standardized column names and formats
   - Use as your target schema for mapping

2. **shopify_orders.csv** - Shopify export
   - 15 orders with Shopify-specific column names
   - Different date formats and field naming conventions

3. **amazon_seller_central.csv** - Amazon Seller Central export
   - 12 orders with Amazon's unique schema
   - Hyphenated column names (e.g., "Purchase-Date")

4. **woocommerce_export.csv** - WooCommerce export
   - 8 orders with WooCommerce naming patterns
   - Underscore-separated column names

## 🎯 What This Dataset Demonstrates

### Column Mapping
- **Different field names** across systems (e.g., "Created" vs "Purchase-Date" vs "Order_Date")
- **Structural variations** (e.g., "Name" vs "Billing_Full_Name" vs "Buyer-Name")
- **AI-powered mapping** suggestions to automatically align columns

### Value Translation
- **Order statuses** vary by platform:
  - Shopify: "fulfilled", "pending", "cancelled"
  - Amazon: "Shipped", "Processing", "Pending"
  - WooCommerce: "processing", "shipped", "pending"
- **Payment methods** need standardization
- **Country codes** and address formats differ

### Data Quality Issues
The dataset includes 15+ intentional data quality issues:
- Missing required fields (email, customer name)
- Invalid email formats
- Invalid phone number formats
- Future order dates (plausibility issues)
- Negative amounts
- Inconsistent status values
- Missing payment information

## 🚀 Quick Start Guide

### Step 1: Upload Files
1. Open Datally
2. Create a new session
3. Upload **_DICTIONARY_order_system.csv** FIRST (as reference)
4. Upload the 3 source files (Shopify, Amazon, WooCommerce)

### Step 2: Analyze & Map
1. Go to the **Analysis** page to review file structures
2. Go to the **Mapping** page
3. Enable AI assistance for automatic column mapping
4. Review and adjust mappings as needed

### Step 3: Translate Values
1. Go to the **Translation** page
2. Review value variations (especially Order_Status and Payment_Method)
3. Use AI suggestions or create manual translations
4. Apply translations to standardize values

### Step 4: Validate Data
1. Go to the **Validation** page
2. Add validation rules:
   - **Pattern Check**: Email format, phone format
   - **Mandatory Check**: Customer_Email, Order_Date
   - **Enumeration Check**: Order_Status, Payment_Method
   - **Plausibility Check**: Order_Date not in future, amounts > 0
3. Run validation to identify exceptions

### Step 5: Consolidate
1. Go to the **Consolidation** page
2. Review the consolidated data
3. Export to your desired format

## 📊 Expected Results

After proper mapping and translation, you should have:
- **35 total orders** consolidated from 3 sources
- **Standardized column names** matching the dictionary
- **Consistent status values** (e.g., all "Shipped" instead of "shipped"/"Shipped"/"fulfilled")
- **Identified exceptions** for data quality issues
- **Clean, consolidated dataset** ready for analysis

## 💡 Tips

- **Always upload the dictionary file first** - This establishes your target schema
- **Enable AI features** - Make sure Ollama is running for AI-powered mapping and translation
- **Review AI suggestions** - AI is very accurate but always review before applying
- **Use the Exception Explorer** - Great for reviewing and fixing data quality issues
- **Try different validation rules** - Experiment with different rule types to see what works best

## 🎓 Learning Objectives

This dataset is perfect for learning:
1. How to handle multiple data sources with different schemas
2. Column mapping strategies (exact match, fuzzy match, AI-powered)
3. Value translation for standardizing categorical data
4. Data validation rule creation and application
5. Exception handling and data quality improvement
6. End-to-end data consolidation workflow

## 📝 Notes

- This is synthetic data created for demonstration purposes
- All customer names, emails, and phone numbers are fictional
- Data quality issues are intentionally included to showcase validation features
- The dataset is designed to be completed in 15-30 minutes

## 🆘 Need Help?

- Check the Datally documentation for detailed guides
- Visit the Resources page for video tutorials
- Contact support if you encounter any issues

---

**Ready to consolidate?** Start by uploading the dictionary file and let Datally's AI do the heavy lifting!

