Document Indexing — PUT, POST ve Auto-Generated ID | Ücretsiz Elasticsearch: Sıfırdan Uzmanlığa Kursu

PUT, POST, Auto-Generated ID

Bir mektup göndermeyi düşün. Zarfın üstüne adres yazarsın (ID), mektubu zarfa koyarsın (document), posta kutusuna atarsın (index). Elasticsearch'e döküman eklemek tam olarak bu — zarfı adresle veya adresi postacıya bırak (otomatik ID).

Bu derste döküman eklemenin (indexing) tüm yollarını, inceliklerini ve performans sırlarını öğreneceğiz.

Indexing Nedir?

"Indexing", bir dökümanı Elasticsearch'e ekleme işlemidir. Bu süreçte:

Döküman JSON olarak gönderilir
Mapping'e göre field tipleri belirlenir
Text field'lar analiz edilir (tokenization + normalization)
Inverted index'e yazılır
_source'a orijinal JSON saklanır
Routing formülüyle doğru shard'a yönlendirilir
Replica shard'lara kopyalanır

Client → POST /products/_doc → Koordinator Node → Routing → Primary Shard
                                                                ↓
                                                          Analiz + Index
                                                                ↓
                                                          Replica Shard'lara kopyala
                                                                ↓
                                                          Client'a yanıt dön

PUT ile Indexing — Belirli ID

PUT ile döküman eklerken sen ID'yi belirlersin:

// Basit döküman ekleme
PUT /products/_doc/1
{
  "name": "MacBook Pro 16",
  "price": 75000,
  "category": "Laptop",
  "brand": "Apple",
  "in_stock": true,
  "description": "M3 Pro çipli profesyonel laptop"
}

// Yanıt:
{
  "_index": "products",
  "_id": "1",
  "_version": 1,
  "result": "created",         // İlk kez oluşturuldu
  "_shards": {
    "total": 2,                // Primary + replica
    "successful": 2,           // İkisi de başarılı
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

PUT — Aynı ID ile Tekrar Yazma

Aynı ID ile tekrar PUT yaparsan, dökümanın tamamı değiştirilir (replace):

// İlk yazma
PUT /products/_doc/1
{
  "name": "MacBook Pro 16",
  "price": 75000,
  "category": "Laptop",
  "brand": "Apple",
  "in_stock": true
}
// result: "created", _version: 1

// Aynı ID ile tekrar yazma — TÜM DÖKÜMAN DEĞİŞİR
PUT /products/_doc/1
{
  "name": "MacBook Pro 16 M3",
  "price": 72000
}
// result: "updated", _version: 2
// ⚠️ DİKKAT: "category", "brand", "in_stock" alanları kayboldu!
// PUT tüm dökümanı replace eder — kısmi güncelleme yapmaz!

op_type Parametresi — Sadece Oluşturma

ID zaten varsa hata vermesini istiyorsan:

// Yöntem 1: op_type=create
PUT /products/_doc/1?op_type=create
{
  "name": "Yeni Ürün",
  "price": 100
}
// ID=1 zaten varsa: 409 Conflict hatası

// Yöntem 2: _create endpoint'i (aynı şey)
PUT /products/_create/1
{
  "name": "Yeni Ürün",
  "price": 100
}
// ID=1 zaten varsa: 409 Conflict hatası

Bu upsert (varsa güncelle) davranışını engellemek istediğinde çok faydalı — örneğin, aynı siparişi iki kez indexlemeyi önlemek.

POST ile Indexing — Otomatik ID

POST ile ID belirtmezsen Elasticsearch otomatik benzersiz ID üretir:

POST /products/_doc
{
  "name": "AirPods Pro 2",
  "price": 8500,
  "category": "Kulaklık",
  "brand": "Apple",
  "in_stock": true
}

// Yanıt:
{
  "_index": "products",
  "_id": "xYz-ABC123-dEf456",   // Otomatik üretilen 20 karakter ID
  "_version": 1,
  "result": "created",
  "_shards": { "total": 2, "successful": 2, "failed": 0 },
  "_seq_no": 5,
  "_primary_term": 1
}

Otomatik ID'nin Avantajları

Daha hızlı yazma: Elasticsearch, ID'nin zaten var olup olmadığını kontrol etmek zorunda kalmaz. PUT ile belirli ID gönderdiğinde, Elasticsearch önce o ID'nin var olup olmadığına bakar — bu ek maliyet
Çakışma riski yok: Her ID benzersiz olduğu garanti
Dağıtık ortamda güvenli: Birden fazla client aynı anda yazsa bile çakışma olmaz

Ne Zaman Hangi Yaklaşım?

Senaryo	Yaklaşım	Neden?
Veritabanından sync	PUT + DB primary key	Upsert (idempotent)
Log/Event verisi	POST (otomatik ID)	Benzersizlik önemli değil, hız önemli
Kullanıcı profilleri	PUT + user_id	Her kullanıcı tek döküman
Ürün kataloğu	PUT + product_id/SKU	Upsert gerekli
IoT sensor verisi	POST (otomatik ID)	Yüksek hacimli akış verisi
Sipariş verileri	PUT + order_id	Duplicate engelleme

Index Oluşturma Davranışları

Otomatik Index Oluşturma

Elasticsearch'e henüz var olmayan bir index'e döküman göndermen yeterli — index otomatik oluşturulur:

// "my-new-index" daha önce oluşturulmadı
POST /my-new-index/_doc
{
  "message": "Index otomatik oluşturuldu!"
}
// Index oluştu + döküman eklendi — tek request'te iki iş

Bu davranışı kapatabilirsin (production'da önerilir):

// Otomatik index oluşturmayı kapat
PUT /_cluster/settings
{
  "persistent": {
    "action.auto_create_index": "false"
  }
}

// Belirli pattern'lere izin ver
PUT /_cluster/settings
{
  "persistent": {
    "action.auto_create_index": "app-logs-*,-random-*,+specific-index"
  }
}
// app-logs-* → İzin ver
// random-* → Engelle
// specific-index → İzin ver

Pipeline ile Indexing

Ingest pipeline tanımlayıp dökümanın index sırasında dönüştürülmesini sağlayabilirsin:

// Pipeline oluştur
PUT /_ingest/pipeline/add-timestamp
{
  "description": "Otomatik timestamp ekle",
  "processors": [
    {
      "set": {
        "field": "indexed_at",
        "value": "{{_ingest.timestamp}}"
      }
    },
    {
      "lowercase": {
        "field": "category"
      }
    }
  ]
}

// Pipeline ile döküman ekle
PUT /products/_doc/10?pipeline=add-timestamp
{
  "name": "Laptop Stand",
  "price": 500,
  "category": "AKSESUAR"
}

// Sonuç:
{
  "name": "Laptop Stand",
  "price": 500,
  "category": "aksesuar",           // lowercase uygulandı
  "indexed_at": "2025-01-15T10:00:00Z"  // otomatik eklendi
}

// Index'e varsayılan pipeline atama
PUT /products/_settings
{
  "index.default_pipeline": "add-timestamp"
}
// Artık her döküman bu pipeline'dan geçer

Timeout ve Write Parametreleri

timeout — Bekleme Süresi

// Primary shard müsait değilse 30 saniye bekle (varsayılan: 1m)
PUT /products/_doc/1?timeout=30s
{
  "name": "Laptop",
  "price": 15000
}

refresh — Aranabilirlik

// Hemen aranabilir yap (performans maliyeti var)
PUT /products/_doc/1?refresh=true
{
  "name": "Laptop",
  "price": 15000
}

// Bir sonraki refresh cycle'ı bekle (daha verimli)
PUT /products/_doc/2?refresh=wait_for
{
  "name": "Mouse",
  "price": 300
}

// Refresh yapma (varsayılan — normal refresh interval'de aranabilir olur)
PUT /products/_doc/3?refresh=false
{
  "name": "Keyboard",
  "price": 800
}

routing — Shard Yönlendirme

// Belirli shard'a yönlendir
PUT /orders/_doc/order-1?routing=customer-42
{
  "customer_id": "customer-42",
  "product": "Laptop",
  "amount": 15000
}

wait_for_active_shards — Yazma Güvencesi

// Tüm shard'lar (primary + tüm replica'lar) yazana kadar bekle
PUT /products/_doc/1?wait_for_active_shards=all
{
  "name": "Kritik Ürün",
  "price": 100000
}

// En az 2 shard aktif olmalı
PUT /products/_doc/2?wait_for_active_shards=2
{
  "name": "Önemli Ürün",
  "price": 50000
}

Çoklu Döküman Ekleme — Bulk API

Toplu işlemler için Bulk API kritik öneme sahip. Tek tek döküman eklemek ağ overhead'i yaratır — 1000 döküman = 1000 HTTP request. Bulk ile 1 request'te 1000 döküman.

Bulk API Söz Dizimi

POST /_bulk
{"index": {"_index": "products", "_id": "1"}}
{"name": "Laptop", "price": 15000, "category": "Elektronik"}
{"index": {"_index": "products", "_id": "2"}}
{"name": "Mouse", "price": 300, "category": "Aksesuar"}
{"index": {"_index": "products", "_id": "3"}}
{"name": "Keyboard", "price": 800, "category": "Aksesuar"}

Format kuralları:

Her satır tek bir JSON objesi (NDJSON — Newline Delimited JSON)
Action satırı + döküman satırı çiftler halinde
Son satırdan sonra boş satır (newline) olmalı
Güzel formatlanmış (pretty) JSON kullanılmaz

Bulk Action Tipleri

POST /_bulk
// index: Varsa güncelle, yoksa oluştur
{"index": {"_index": "products", "_id": "1"}}
{"name": "Laptop", "price": 15000}

// create: Sadece oluştur (varsa hata verir)
{"create": {"_index": "products", "_id": "2"}}
{"name": "Mouse", "price": 300}

// update: Kısmi güncelleme
{"update": {"_index": "products", "_id": "1"}}
{"doc": {"price": 14500}}

// delete: Silme (body satırı yok!)
{"delete": {"_index": "products", "_id": "3"}}

Belirli Index'e Bulk

// Index adını URL'de belirtirsen, action satırında tekrarlamana gerek yok
POST /products/_bulk
{"index": {"_id": "10"}}
{"name": "Monitor", "price": 5000}
{"index": {"_id": "11"}}
{"name": "Webcam", "price": 1500}
{"index": {"_id": "12"}}
{"name": "Headset", "price": 2000}

Bulk Yanıtı

// Yanıt (kısaltılmış):
{
  "took": 30,              // Toplam süre (ms)
  "errors": false,         // Hata var mı? (herhangi biri)
  "items": [
    {
      "index": {
        "_index": "products",
        "_id": "10",
        "_version": 1,
        "result": "created",
        "status": 201
      }
    },
    {
      "index": {
        "_index": "products",
        "_id": "11",
        "_version": 1,
        "result": "created",
        "status": 201
      }
    }
  ]
}

⚠️ Kritik: Bulk API'de bir işlemin hata vermesi diğerlerini durdurmaz. Her işlem bağımsızdır. errors: true ise, items array'inde her birini kontrol et.

Bulk Performans — Best Practices

// 1. Optimal batch boyutu: 1000-5000 döküman veya 5-15MB per batch
// Çok küçük → Ağ overhead
// Çok büyük → Memory pressure, timeout riski

// 2. Refresh'i kapat, sonra aç
PUT /products/_settings
{ "index": { "refresh_interval": "-1" } }

// Bulk işlemleri yap...
POST /products/_bulk
{"index": {"_id": "1"}}
{"name": "Product 1"}
// ... binlerce satır

// Refresh'i geri aç
PUT /products/_settings
{ "index": { "refresh_interval": "1s" } }

// Manuel refresh tetikle
POST /products/_refresh

// 3. Replica'yı kapat (büyük initial load için)
PUT /products/_settings
{ "number_of_replicas": 0 }

// Bulk yükleme yap...

// Replica'yı geri aç
PUT /products/_settings
{ "number_of_replicas": 1 }

cURL ile Bulk Dosyasından Yükleme

# data.ndjson dosyası:
# {"index": {"_id": "1"}}
# {"name": "Product 1", "price": 100}
# {"index": {"_id": "2"}}
# {"name": "Product 2", "price": 200}

# Dosyadan bulk yükleme
curl -X POST "localhost:9200/products/_bulk" \
  -H 'Content-Type: application/x-ndjson' \
  --data-binary @data.ndjson

# ÖNEMLİ: --data-binary kullan, -d değil!
# -d newline'ları kaldırır, --data-binary korur

Versiyonlama ve Concurrency

Internal Versioning

Her dökümanın _version, _seq_no ve _primary_term değerleri var:

// İlk oluşturma
PUT /products/_doc/1
{ "name": "Laptop", "price": 15000 }
// _version: 1, _seq_no: 0, _primary_term: 1

// Güncelleme (PUT ile replace)
PUT /products/_doc/1
{ "name": "Laptop Pro", "price": 17000 }
// _version: 2, _seq_no: 1, _primary_term: 1

Optimistic Concurrency Control

İki process aynı dökümanı aynı anda güncellemeye çalışırsa:

// Process A: Dökümanı oku
GET /products/_doc/1
// _seq_no: 5, _primary_term: 1

// Process B: Aynı dökümanı oku
GET /products/_doc/1
// _seq_no: 5, _primary_term: 1

// Process A: Güncelle (başarılı)
PUT /products/_doc/1?if_seq_no=5&if_primary_term=1
{ "name": "Laptop Pro", "price": 17000 }
// ✅ Başarılı — _seq_no: 6

// Process B: Güncelle (başarısız!)
PUT /products/_doc/1?if_seq_no=5&if_primary_term=1
{ "name": "Laptop Ultra", "price": 20000 }
// ❌ 409 Conflict — seq_no değişmiş!

// Process B: Tekrar oku ve dene (retry pattern)
GET /products/_doc/1
// _seq_no: 6
PUT /products/_doc/1?if_seq_no=6&if_primary_term=1
{ "name": "Laptop Ultra", "price": 20000 }
// ✅ Başarılı

Bu "read-modify-write" pattern'ı race condition'ları önler.

Java ile Document Indexing

import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch.core.*;
import co.elastic.clients.elasticsearch.core.bulk.*;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;

import java.util.List;
import java.util.Map;

class Main {
    public static void main(String[] args) throws Exception {
        RestClient restClient = RestClient.builder(
            new HttpHost("localhost", 9200)
        ).build();
        ElasticsearchClient client = new ElasticsearchClient(
            new RestClientTransport(restClient, new JacksonJsonpMapper())
        );

        // 1. Tekil döküman ekleme (belirli ID)
        Map<String, Object> product = Map.of(
            "name", "MacBook Pro 16",
            "price", 75000,
            "category", "Laptop",
            "in_stock", true
        );

        IndexResponse response = client.index(i -> i
            .index("products")
            .id("1")
            .document(product)
        );
        System.out.println("Result: " + response.result());    // Created
        System.out.println("Version: " + response.version());  // 1

        // 2. Otomatik ID ile ekleme
        Map<String, Object> product2 = Map.of(
            "name", "AirPods Pro",
            "price", 8500,
            "category", "Kulaklık"
        );

        IndexResponse autoIdResponse = client.index(i -> i
            .index("products")
            .document(product2)
        );
        System.out.println("Auto ID: " + autoIdResponse.id());

        // 3. op_type=create (sadece oluştur, varsa hata)
        try {
            IndexResponse createResponse = client.index(i -> i
                .index("products")
                .id("1")
                .document(product)
                .opType(OpType.Create)
            );
        } catch (Exception e) {
            System.out.println("Beklenen hata: " + e.getMessage());
            // version_conflict_engine_exception — ID=1 zaten var
        }

        // 4. Bulk API
        BulkResponse bulkResponse = client.bulk(b -> b
            .index("products")
            .operations(ops -> ops
                .index(idx -> idx.id("10").document(Map.of("name", "Monitor", "price", 5000)))
            )
            .operations(ops -> ops
                .index(idx -> idx.id("11").document(Map.of("name", "Webcam", "price", 1500)))
            )
            .operations(ops -> ops
                .index(idx -> idx.id("12").document(Map.of("name", "Keyboard", "price", 800)))
            )
        );

        System.out.println("Bulk errors: " + bulkResponse.errors());
        System.out.println("Bulk took: " + bulkResponse.took() + "ms");

        for (BulkResponseItem item : bulkResponse.items()) {
            System.out.printf("ID: %s, Result: %s, Status: %d%n",
                item.id(), item.result(), item.status()
            );
        }

        // 5. POJO ile indexing
        // record Product(String name, int price, String category, boolean inStock) {}
        // client.index(i -> i.index("products").id("20").document(new Product("Tablet", 12000, "Elektronik", true)));

        restClient.close();
    }
}

Java Bulk Helper — Büyük Veri Setleri İçin

import co.elastic.clients.elasticsearch.core.BulkRequest;
import java.util.List;
import java.util.Map;

// Büyük veri setini batch'lere böl
public static void bulkIndex(ElasticsearchClient client, 
                              List<Map<String, Object>> products, 
                              int batchSize) throws Exception {
    for (int i = 0; i < products.size(); i += batchSize) {
        List<Map<String, Object>> batch = products.subList(
            i, Math.min(i + batchSize, products.size())
        );

        BulkRequest.Builder builder = new BulkRequest.Builder().index("products");
        for (Map<String, Object> product : batch) {
            builder.operations(op -> op
                .index(idx -> idx.document(product))
            );
        }

        var response = client.bulk(builder.build());
        if (response.errors()) {
            // Hataları logla
            response.items().stream()
                .filter(item -> item.error() != null)
                .forEach(item -> System.err.println(
                    "Error: " + item.error().reason()
                ));
        }

        System.out.printf("Batch %d-%d: %dms%n", 
            i, Math.min(i + batchSize, products.size()), response.took());
    }
}

Indexing Performans Optimizasyonları

1. Bulk Size Optimizasyonu

Kural: 5-15MB per bulk request, 1000-5000 döküman per batch

Test et:
- 500 döküman/batch → Throughput ölç
- 1000 döküman/batch → Throughput ölç
- 2000 döküman/batch → Throughput ölç
- 5000 döküman/batch → Throughput ölç
- En yüksek throughput'u veren batch size'ı kullan

2. Client-Side Threading

Single thread: 5000 doc/sec
4 threads: 18000 doc/sec
8 threads: 30000 doc/sec

Kural: CPU core sayısı kadar concurrent bulk thread

3. Index Ayarları

// Büyük initial load öncesi
PUT /products/_settings
{
  "index": {
    "refresh_interval": "-1",
    "number_of_replicas": 0,
    "translog.durability": "async",
    "translog.sync_interval": "30s"
  }
}

// Yükleme sonrası geri al
PUT /products/_settings
{
  "index": {
    "refresh_interval": "1s",
    "number_of_replicas": 1,
    "translog.durability": "request"
  }
}
POST /products/_refresh
POST /products/_forcemerge?max_num_segments=5

4. Mapping Optimizasyonu

// Gereksiz analiz yapma
{
  "sku": { "type": "keyword" },              // text değil!
  "internal_id": { "type": "keyword", "index": false },  // aranmayacaksa
  "raw_data": { "type": "object", "enabled": false }     // sadece _source'ta
}

Best Practices

✅ Toplu yazma işlemlerinde Bulk API kullan — 1000 tek request yerine 1 bulk request

✅ ID stratejini baştan belirle — DB sync = DB ID, log verisi = otomatik ID

✅ `op_type=create` kullan — Duplicate'ı engellemek istediğin yerlerde

✅ Büyük initial load'da refresh/replica kapat — 2-5x performans artışı

✅ Bulk batch boyutunu test ederek ayarla — 5-15MB per request hedefle

✅ Pipeline ile dönüşüm yap — Uygulama tarafında veri dönüştürme yerine ingest pipeline kullan

✅ Optimistic concurrency kullan — if_seq_no + if_primary_term ile race condition önle

Yaygın Hatalar

❌ "PUT ile kısmi güncelleme yapıyorum"

PUT tüm dökümanı replace eder. Kısmi güncelleme için _update API kullan (bir sonraki ders).

❌ "Bulk API'de bir hata diğerlerini etkiler"

Hayır. Her bulk operasyonu bağımsızdır. errors: true ise sadece hatalı olanları kontrol et.

❌ "Bulk'ta güzel formatlanmış JSON kullanıyorum"

// ❌ YANLIŞ — Pretty JSON
POST /_bulk
{
  "index": {
    "_index": "products",
    "_id": "1"
  }
}
{
  "name": "Laptop"
}

// ✅ DOĞRU — Her JSON tek satırda (NDJSON)
POST /_bulk
{"index":{"_index":"products","_id":"1"}}
{"name":"Laptop"}

❌ "Her dökümanı tek tek ekliyorum"

10.000 döküman eklerken 10.000 HTTP request göndermek ağ overhead'inin büyük çoğunluğunu oluşturur. Bulk API ile 10 request'e düşür.

❌ "cURL'de --data-binary yerine -d kullanıyorum"

-d newline karakterlerini kaldırır — Bulk API bozulur. Her zaman --data-binary kullan.

Özet

PUT ile belirli ID'ye döküman eklersin — varsa replace eder (tümünü değiştirir)
POST ile otomatik ID'li döküman eklersin — daha hızlı (varlık kontrolü yok)
op_type=create veya _create endpoint'i — sadece oluştur, varsa hata ver
Bulk API toplu işlemler için kritik — 1000-5000 döküman/batch, 5-15MB/request
Pipeline ile index sırasında veri dönüşümü yapabilirsin
Optimistic concurrency (if_seq_no + if_primary_term) ile race condition önlenir
Büyük yükleme öncesi refresh kapatma, replica azaltma performansı dramatik artırır

Bir sonraki derste Document Okuma işlemlerini öğreneceğiz — GET, _source, _mget ve exists!