Document Okuma — GET, _source, _mget | Ücretsiz Elasticsearch: Sıfırdan Uzmanlığa Kursu

GET, _source, _mget, exists

Kütüphanede raftan kitap almayı düşün. Kütüphaneciye kitabın kodunu verirsin, adam doğrudan rafa gider ve kitabı getirir — tüm kütüphaneyi taramaz. Elasticsearch'te GET API'si tam olarak bu: belirli bir dökümanı ID'siyle doğrudan getirir. Arama (search) yapmaz, inverted index kullanmaz — doğrudan adresle gider.

Bu derste döküman okuma işlemlerinin tüm yollarını, filtreleme seçeneklerini ve toplu okuma tekniklerini öğreneceğiz.

GET — Tekil Döküman Okuma

En temel okuma operasyonu:

// Dökümanı getir
GET /products/_doc/1

// Yanıt:
{
  "_index": "products",
  "_id": "1",
  "_version": 3,
  "_seq_no": 12,
  "_primary_term": 1,
  "found": true,               // Döküman bulundu mu?
  "_source": {                  // Orijinal JSON
    "name": "MacBook Pro 16",
    "price": 75000,
    "category": "Laptop",
    "brand": "Apple",
    "in_stock": true,
    "description": "M3 Pro çipli profesyonel laptop"
  }
}

Bulunamayan Döküman

// Var olmayan döküman
GET /products/_doc/999

// Yanıt (HTTP 404):
{
  "_index": "products",
  "_id": "999",
  "found": false          // Bulunamadı
}

found alanı programatik olarak kontrol edilmeli — HTTP status code 200 (bulundu) veya 404 (bulunamadı) döner.

GET vs Search Farkı

Bu ayrım çok önemli:

Özellik	GET (_doc)	Search (_search)
Nasıl çalışır	ID ile doğrudan erişim	Inverted index'te arama
Routing	hash(ID) → doğrudan shard	Tüm shard'ları tara
Hız	Çok hızlı (< 1ms)	Arama karmaşıklığına bağlı
Scoring	Yok	Relevance skoru var
Real-time	Evet (translog'dan okur)	Near real-time (refresh sonrası)
Kullanım	Bildiğin ID ile oku	Arama sorgusu çalıştır

💡 İpucu: GET, refresh beklemeden bile dökümanı bulabilir. Çünkü translog'dan okur — henüz segment'e yazılmamış bile olsa. _search ise refresh sonrası segment'ten okur.

realtime Parametresi

// Varsayılan: realtime=true — translog'dan okur (en güncel)
GET /products/_doc/1

// realtime=false — sadece segment'ten oku (refresh sonrası)
GET /products/_doc/1?realtime=false

_source — Sadece Veri

Metadata (index, version, seq_no...) istemiyorsan, sadece orijinal JSON'u al:

// Sadece _source (metadata yok)
GET /products/_source/1

// Yanıt (düz JSON, wrapper yok):
{
  "name": "MacBook Pro 16",
  "price": 75000,
  "category": "Laptop",
  "brand": "Apple",
  "in_stock": true,
  "description": "M3 Pro çipli profesyonel laptop"
}

_source Filtreleme

Büyük dökümanlardan sadece belirli alanları çekmek ağ trafiğini ve belleği azaltır:

// Belirli alanları dahil et (include)
GET /products/_doc/1?_source_includes=name,price

// Yanıt:
{
  "_index": "products",
  "_id": "1",
  "found": true,
  "_source": {
    "name": "MacBook Pro 16",
    "price": 75000
  }
}

// Belirli alanları hariç tut (exclude)
GET /products/_doc/1?_source_excludes=description,specs

// Wildcard ile
GET /products/_doc/1?_source_includes=spec*

// İkisini birlikte kullan
GET /products/_doc/1?_source_includes=name,price,spec*&_source_excludes=specs.internal

// _source'u tamamen kapat
GET /products/_doc/1?_source=false

// Yanıt:
{
  "_index": "products",
  "_id": "1",
  "found": true
  // _source alanı yok
}

Stored Fields

_source dışında ayrıca saklanmış alanlar (mapping'de store: true) varsa:

// Mapping'de store: true olan alanları getir
GET /products/_doc/1?stored_fields=name,price

// _source yerine stored_fields dönür:
{
  "_index": "products",
  "_id": "1",
  "found": true,
  "fields": {
    "name": ["MacBook Pro 16"],
    "price": [75000]
  }
}

Genellikle _source filtreleme yeterli. stored_fields çok büyük _source'larda performans kazancı sağlar ama nadiren gerekli.

HEAD — Varlık Kontrolü

Dökümanın var olup olmadığını kontrol et — body dönmez, sadece HTTP status code:

# cURL ile HEAD request
curl -I -X HEAD "localhost:9200/products/_doc/1"

# Varsa: HTTP/1.1 200 OK
# Yoksa: HTTP/1.1 404 Not Found

// Kibana Dev Tools'ta
HEAD /products/_doc/1
// 200 - {"statusCode":200} veya 404 - {"statusCode":404}

HEAD, GET'e göre çok daha hafif — body transfer etmez, sadece var mı yok mu bilgisi. Varlık kontrolü gereken yerlerde tercih et.

Index Varlık Kontrolü

// Index var mı?
HEAD /products
// 200 → var, 404 → yok

// Bu, if-exists pattern'leri için faydalı:
// Kod: if index exists, search; else create

_mget — Çoklu Döküman Okuma

Birden fazla dökümanı tek request'te getir:

// Farklı index'lerden çoklu döküman
GET /_mget
{
  "docs": [
    {
      "_index": "products",
      "_id": "1"
    },
    {
      "_index": "products",
      "_id": "2"
    },
    {
      "_index": "orders",
      "_id": "order-100"
    }
  ]
}

// Yanıt:
{
  "docs": [
    {
      "_index": "products",
      "_id": "1",
      "found": true,
      "_source": { "name": "MacBook Pro 16", "price": 75000 }
    },
    {
      "_index": "products",
      "_id": "2",
      "found": true,
      "_source": { "name": "ThinkPad X1 Carbon", "price": 45000 }
    },
    {
      "_index": "orders",
      "_id": "order-100",
      "found": false           // Bu döküman bulunamadı
    }
  ]
}

Aynı Index'ten _mget (Kısa Yol)

// URL'de index belirt — docs'ta tekrarlama
GET /products/_mget
{
  "ids": ["1", "2", "3", "5", "10"]
}

// Yanıt: Her ID için sırayla found: true/false döner

_mget ile _source Filtreleme

GET /_mget
{
  "docs": [
    {
      "_index": "products",
      "_id": "1",
      "_source": ["name", "price"]     // Sadece name ve price
    },
    {
      "_index": "products",
      "_id": "2",
      "_source": {
        "includes": ["name", "category"],
        "excludes": ["description"]
      }
    },
    {
      "_index": "products",
      "_id": "3",
      "_source": false                  // _source istemiyorum
    }
  ]
}

_mget vs Tek Tek GET — Performans

Senaryo: 100 döküman getir

Tek tek GET:
- 100 HTTP request
- 100 × ~2ms ağ overhead = ~200ms sadece ağ
- TCP connection overhead
- Toplam: ~250ms

_mget:
- 1 HTTP request
- 1 × ~2ms ağ overhead
- Paralel shard erişimi
- Toplam: ~15ms

_mget, tek tek GET'ten ~10-20x daha hızlı

Search API ile Döküman Okuma

Bazen ID'yi bilmezsin — kriterlere göre okumak istersin:

Tüm Dökümanları Listele

// match_all — hepsini getir (varsayılan 10 adet)
GET /products/_search
{
  "query": {
    "match_all": {}
  }
}

// Sayısını belirt
GET /products/_search
{
  "size": 50,
  "query": {
    "match_all": {}
  }
}

Sayfalama (Pagination)

// Sayfa 1 (ilk 10)
GET /products/_search
{
  "from": 0,
  "size": 10,
  "query": { "match_all": {} }
}

// Sayfa 2
GET /products/_search
{
  "from": 10,
  "size": 10,
  "query": { "match_all": {} }
}

// Sayfa 3
GET /products/_search
{
  "from": 20,
  "size": 10,
  "query": { "match_all": {} }
}

⚠️ Dikkat: from + size toplamı varsayılan olarak 10.000'i geçemez. Derin sayfalama için search_after kullan (Bölüm 6).

_source Filtreleme (Search'te)

// Arama sonuçlarında sadece belirli alanları getir
GET /products/_search
{
  "_source": ["name", "price", "category"],
  "query": {
    "match": { "category": "Laptop" }
  }
}

// Include/Exclude
GET /products/_search
{
  "_source": {
    "includes": ["name", "price", "specs.*"],
    "excludes": ["specs.internal_*"]
  },
  "query": { "match_all": {} }
}

// _source tamamen kapat (sadece _id ve _score iste)
GET /products/_search
{
  "_source": false,
  "query": { "match_all": {} }
}

Sıralama (Sorting)

// Fiyata göre artan sıralama
GET /products/_search
{
  "sort": [
    { "price": "asc" }
  ],
  "query": { "match_all": {} }
}

// Çoklu sıralama — önce stok durumu, sonra fiyat
GET /products/_search
{
  "sort": [
    { "in_stock": "desc" },
    { "price": "asc" }
  ],
  "query": { "match_all": {} }
}

// Relevance + fiyat
GET /products/_search
{
  "sort": [
    "_score",
    { "price": "asc" }
  ],
  "query": {
    "match": { "name": "laptop" }
  }
}

Döküman Sayısı

// Index'teki toplam döküman sayısı
GET /products/_count

// Yanıt:
{
  "count": 1542,
  "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }
}

// Sorgu ile eşleşen döküman sayısı
GET /products/_count
{
  "query": {
    "term": { "category.keyword": "Laptop" }
  }
}
// count: 23

// _cat ile hızlı kontrol
GET /_cat/count/products?v
// epoch      timestamp count
// 1736899200 10:00:00  1542

Gerçek Dünya Senaryosu: E-Ticaret Ürün Detayı

Tipik bir e-ticaret uygulamasında ürün detay sayfası:

// 1. Ürünü ID ile getir (ürün detay sayfası)
GET /products/_doc/PROD-2025-001

// 2. Sadece ihtiyacın olan alanları getir (liste sayfası)
GET /products/_mget
{
  "docs": [
    { "_id": "PROD-2025-001", "_source": ["name", "price", "thumbnail", "rating"] },
    { "_id": "PROD-2025-002", "_source": ["name", "price", "thumbnail", "rating"] },
    { "_id": "PROD-2025-003", "_source": ["name", "price", "thumbnail", "rating"] },
    { "_id": "PROD-2025-004", "_source": ["name", "price", "thumbnail", "rating"] },
    { "_id": "PROD-2025-005", "_source": ["name", "price", "thumbnail", "rating"] }
  ]
}

// 3. Ürün var mı kontrol et (sepete ekleme öncesi)
HEAD /products/_doc/PROD-2025-001
// 200 → Ürün mevcut, sepete eklenebilir
// 404 → Ürün bulunamadı, hata göster

// 4. Benzer ürünlerin ID'leriyle toplu çekme
GET /products/_mget
{
  "ids": ["PROD-2025-010", "PROD-2025-011", "PROD-2025-012"]
}

// 5. Ürün versiyon kontrolü (cache invalidation)
GET /products/_doc/PROD-2025-001?_source=false
// Sadece metadata al — _version veya _seq_no kontrol et
// Eğer cache'teki version ile aynıysa, cache'ten sun

API Gateway Pattern

Kullanıcı İsteği: /product/PROD-2025-001

1. Cache kontrol → Cache'te var mı?
   → Varsa: HEAD /products/_doc/PROD-2025-001 → _seq_no kontrol
     → Aynıysa: Cache'ten sun
     → Değişmişse: GET ile güncel al

2. Cache'te yoksa:
   → GET /products/_doc/PROD-2025-001 → Cache'e yaz + sun

3. İlişkili ürünler:
   → _mget ile toplu çek → Cache'e yaz

Java ile Document Okuma

import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch.core.*;
import co.elastic.clients.elasticsearch.core.mget.*;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;

import java.util.List;
import java.util.Map;

class Main {
    public static void main(String[] args) throws Exception {
        RestClient restClient = RestClient.builder(
            new HttpHost("localhost", 9200)
        ).build();
        ElasticsearchClient client = new ElasticsearchClient(
            new RestClientTransport(restClient, new JacksonJsonpMapper())
        );

        // 1. Tekil döküman okuma
        GetResponse<Map> getResponse = client.get(g -> g
            .index("products")
            .id("1"),
            Map.class
        );

        if (getResponse.found()) {
            System.out.println("Bulundu: " + getResponse.source());
            System.out.println("Version: " + getResponse.version());
            System.out.println("SeqNo: " + getResponse.seqNo());
        } else {
            System.out.println("Döküman bulunamadı!");
        }

        // 2. _source filtreleme ile okuma
        GetResponse<Map> filteredResponse = client.get(g -> g
            .index("products")
            .id("1")
            .sourceIncludes("name", "price"),
            Map.class
        );
        System.out.println("Filtered: " + filteredResponse.source());

        // 3. Varlık kontrolü (HEAD)
        boolean exists = client.exists(e -> e
            .index("products")
            .id("1")
        ).value();
        System.out.println("Exists: " + exists);

        // 4. Çoklu döküman okuma (_mget)
        MgetResponse<Map> mgetResponse = client.mget(m -> m
            .index("products")
            .ids(List.of("1", "2", "3", "999")),
            Map.class
        );

        for (MultiGetResponseItem<Map> item : mgetResponse.docs()) {
            if (item.isResult()) {
                var doc = item.result();
                if (doc.found()) {
                    System.out.printf("ID: %s → %s%n", doc.id(), doc.source());
                } else {
                    System.out.printf("ID: %s → Bulunamadı%n", doc.id());
                }
            } else {
                System.out.println("Error: " + item.failure().error().reason());
            }
        }

        // 5. Count
        CountResponse countResponse = client.count(c -> c
            .index("products")
            .query(q -> q.matchAll(m -> m))
        );
        System.out.println("Toplam döküman: " + countResponse.count());

        // 6. Count with query
        CountResponse laptopCount = client.count(c -> c
            .index("products")
            .query(q -> q
                .term(t -> t
                    .field("category.keyword")
                    .value("Laptop")
                )
            )
        );
        System.out.println("Laptop sayısı: " + laptopCount.count());

        restClient.close();
    }
}

Performans İpuçları

1. _source Filtreleme Kullan

// ❌ Tüm _source çekme — 10KB per document
GET /products/_doc/1

// ✅ Sadece ihtiyacın olanı çek — 200B per document
GET /products/_doc/1?_source_includes=name,price

Özellikle büyük dökümanlar (10KB+) ve toplu çekimlerde fark çok belirgin.

2. _mget Kullan (Toplu Okuma)

10 döküman:
  Tek tek GET: 10 request × 2ms = 20ms+
  _mget: 1 request × 5ms = 5ms

100 döküman:
  Tek tek GET: 100 request × 2ms = 200ms+
  _mget: 1 request × 15ms = 15ms

3. Routing Belirt (Custom Routing Varsa)

// Custom routing kullanıyorsan, okurken de belirt — yanlış shard'a gitme
GET /orders/_doc/order-1?routing=customer-42

Custom routing'le index'lenmiş dökümanı routing olmadan çekersen, yanlış shard'a gider ve bulamaz.

4. preference Parametresi

// Aynı shard'dan oku (cache tutarlılığı)
GET /products/_search?preference=_local
{
  "query": { "match_all": {} }
}

// Belirli node'dan oku
GET /products/_search?preference=_prefer_nodes:node-1,node-2
{
  "query": { "match_all": {} }
}

// Session-based (aynı kullanıcı hep aynı shard'dan okur — "bouncing results" önlenir)
GET /products/_search?preference=user-session-12345
{
  "query": { "match_all": {} }
}

"Bouncing results" sorunu: Aynı sorgu her çalıştığında farklı sıralama döner — aynı skora sahip dökümanlar farklı shard'lardan farklı sırayla gelebilir. preference ile bu önlenir.

Sık Kullanılan Okuma Pattern'leri

Pattern 1: Cache-Aside

1. Cache'e bak → Varsa dön
2. Elasticsearch'ten oku → Cache'e yaz → Dön
3. Güncelleme → Cache'i invalide et

Pattern 2: Batch Reader

1. ID listesi al (search veya harici kaynak)
2. _mget ile toplu çek
3. Bulunamayanları logla/handle et

Pattern 3: Conditional Read

// Sadece versiyon değiştiyse oku (ETag benzeri)
// 1. İlk oku
GET /products/_doc/1
// _seq_no: 5, _primary_term: 1

// 2. Sonraki okumalarda stored_fields veya _source=false ile sadece metadata kontrol
GET /products/_doc/1?_source=false
// _seq_no değiştiyse → tam oku
// _seq_no aynıysa → cache'ten sun

Best Practices

✅ ID biliyorsan `GET` kullan, `_search` değil — GET çok daha hızlı, doğrudan shard'a gider

✅ _source filtreleme yap — İhtiyacın olmayan alanları transfer etme

✅ Toplu okuma için `_mget` kullan — N ayrı request yerine 1 request

✅ `HEAD` ile varlık kontrolü yap — Body transfer etmeden sadece 200/404 kontrolü

✅ Custom routing varsa okumada da belirt — Yoksa döküman bulunamaz

✅ `preference` parametresi ile bouncing results önle — Session-based tercih

Yaygın Hatalar

❌ "GET 404 döndü ama dökümanı yeni ekledim"

Near real-time. GET aslında real-time'dır (translog'dan okur) ama bazen shard allocation gecikmesi olabilir. 1-2 saniye bekle veya ?refresh=true ile ekle.

❌ "Custom routing'le ekledim, routing'siz okuyorum"

// Ekleme
PUT /orders/_doc/1?routing=customer-42
{ "order": "data" }

// ❌ Okuma (routing yok — yanlış shard!)
GET /orders/_doc/1
// found: false — yanlış shard'a gitti

// ✅ Okuma (routing ile)
GET /orders/_doc/1?routing=customer-42
// found: true ✅

❌ "_mget ile binlerce döküman çekiyorum"

_mget'te çok fazla ID göndermek bellek ve ağ sorunlarına yol açabilir. 100-500 ID'lik batch'ler halinde çek.

❌ "Arama sonuçlarında tüm _source'u çekiyorum"

Liste sayfasında 50 ürün gösteriyorsun ama her ürünün tüm alanlarını (description, specs, reviews...) çekiyorsun. Sadece name, price, thumbnail çek — sayfa boyutunu 10x azalt.

❌ "from + size > 10000 kullanmaya çalışıyorum"

Elasticsearch varsayılan olarak index.max_result_window: 10000 limiti koyar. Derin sayfalama için search_after kullan (Bölüm 6).

GET API'nin Perde Arkası

GET operasyonunun internal çalışma mekanizmasını anlamak debugging'te çok işe yarar:

1. Client: GET /products/_doc/1

2. Koordinator Node:
   → routing = hash("1") % number_of_shards
   → shard_id = 0 (diyelim)
   → Shard 0'ın primary veya replica'sını seç (adaptive replica selection)

3. Seçilen Shard (Data Node):
   → Translog'a bak (en güncel veri burada)
   → Translog'da yoksa segment'lere bak
   → Bulursa: _source + metadata döndür
   → Bulamazsa: found: false döndür

4. Koordinator Node → Client'a yanıt dön

GET, refresh beklemez çünkü translog'dan okuyabilir. Ama _search API'si sadece segment'lerden okur — refresh gerekir. Bu fark, "döküman ekledim ama search'te görünmüyor" sorusunun cevabıdır:

// Döküman ekle
PUT /products/_doc/1
{ "name": "Test" }

// Hemen GET — ✅ Bulur (translog'dan)
GET /products/_doc/1

// Hemen Search — ❌ Bulamayabilir (refresh olmadıysa)
GET /products/_search
{
  "query": { "term": { "_id": "1" } }
}

// 1 saniye bekle (refresh interval) → Search da bulur

Özet

GET /index/_doc/ID ile tekil döküman okursun — ID-based doğrudan erişim, çok hızlı
GET /index/_source/ID ile sadece orijinal JSON döner, metadata yok
_source_includes/_source_excludes ile dökümanın sadece ihtiyacın olan kısmını çek
HEAD ile döküman veya index varlığını kontrol et — body transfer yok
_mget ile birden fazla dökümanı tek request'te getir — 10-20x daha hızlı
_count ile döküman sayısını öğren — sorgu ile filtrelenmiş sayı da alınabilir
preference parametresi ile bouncing results sorununu önle
Custom routing varsa okumada da routing parametresini belirt — yoksa döküman bulunamaz

Bir sonraki derste Document Güncelleme işlemlerini öğreneceğiz — Update, Partial Update, Scripted Update ve Upsert!