Patching Channel Length of Unreal IRC

Jul 18th, 2013

이런저런 실시간 메시지 전송이 필요한 부분이 있었는데, 이를 위해서 고전적이지만 IRC 를 채용해서 진행하고 있었습니다. 그래서 IRC 서버는 언리얼IRC 로, 여기에서 이런저런 시행착오를 거듭하고 있었습니다.

그러면서 채널ID 로 UUID 를 사용하기로 하고, ID 를 키로해서 KVS 에 이런저런 내용들을 넣고 빼면서 잘 진행하고 있었습니다.

그러던 중에 KVS 에 들어가는 키값의 크기가 뭔가 몇자리 모자란 것 같은 느낌이 들었는데 알고보니 채널이름 길이가 짤렸던 것이었습니다. 그럼 UUID 를 쓰면 안되는 건가… 생각하다가 RFC 문서를 찾아봤습니다.

http://tools.ietf.org/html/rfc2811#section-2.1

Channels names are strings (beginning with a ‘&’, ‘#’, ‘+’ or ‘!’ character) of length up to fifty (50) characters. Channel names are case insensitive.

그러니까 50자라는 건데… 이쪽 IRC 서버에서는 채널이름이 계속 32자정도에서 짤리는 것이죠. 프리노드서버에 들어가서 확인해본 결과 사용하기로 한 채널이름으로 문제없이 전체가 나타나더랍니다.

결론은 언리얼 IRC 서버에서 뭔가 RFC 2811 대로는 아니간듯해서… 다시 컴파일해서 깔아야되나 라는 결론에 도달했습니다.

보나마나 헤더파일에 뭐 그런 것이 있겠거니 해서 찾아보니 딱 나왔습니다. include/struct.h 파일의 CHANNELLEN 상수를 아래처럼 고쳐서 다시 컴파일해서 정상적으로 동작하는 것을 확인했습니다.

아무튼 처음에 Perl 배우면서 손대기 시작했던 IRC 를 이전에 해보지 않았던 방향으로 만지작거리고 있으니… 세상에 IRC 가 새롭게 보이다니요…

내가 Yak Shaving 을 하고 있을 줄이야….
— JEEN (@JEEN_LEE) July 18, 2013

IRC 채널이름의 최대상한은 32자인가… ㅜㅜ
— JEEN (@JEEN_LEE) July 18, 2013

아닌데… Channels names are strings (beginning with a '&', '#', '+' or '!' character) of length up to fifty (50) characters
— JEEN (@JEEN_LEE) July 18, 2013

찾았다. 범인은 네 놈 이렸다. #CHANNELLEN #unrealirc https://t.co/uGGSmW4moR
— JEEN (@JEEN_LEE) July 18, 2013

[번역] Solr vs. ElasticSearch: Part 6 – User & Dev Communities Compared

Jul 17th, 2013

Solr vs. ElasticSearch: Part 6 – User & Dev Communities Compared

January 22, 2013 by Rafał Kuć

One of the questions after my talk during the recent ApacheCon EU was what I thought about the communities of the two search engines I was comparing. Not surprisingly, this is also a question we often address in our consulting engagements. As a part of our Apache Solr vs ElasticSearch post series we decided to step away from the technical aspects of SolrCloud vs. ElasticSearch and look at the communities gathered around thesee two projects. If you haven’t read the previous posts about Apache Solr vs. ElasticSearch here are pointers to all of them:

ApacheCon EU 의 제 강연이 끝난 뒤 받은 질문 중하나는 제가 비교한 두가지 검색엔진의 커뮤니티에 대해서 어떻게 생각하느냐 하는 것이었습니다. 당연하게도 이것은 우리가 컨설팅 계약에서 자주 지적하는 문제이기도 합니다. Apache Solr vs ElasticSearch 시리즈의 글로써 우리는 SolrCloud vs ElasticSearch 의 기술적 요소에서 떠나 아직 ElasticSearch 의 과거의 글들을 읽지 않은 경우에는 여기에 모든 포인터가 있습니다.

유저와 커뮤니티

Let’s start by discussing the user activity around both ElasticSearch and Apache Solr.

ElasticSearch 와 Apache Solr 양쪽의 유저의 활동에 대해서 이야기를 해 볼까요?

유저 활동

We started working on this post right before the Christmas break of 2012. During that time we decided to see how active the user base was for both ElasticSearch and Apache Solr. To do that we used our handy search-lucene.com service and we compared the number of email messages sent to both user list. So let’s see how they stack up.

이 글을 2012년 크리스마스 휴가 전에 시작했습니다. 그 사이 ElasticSearch 와 Apache Solr 두 검색엔진의 유저기반이 어느 정도로 활발한지를 알아보았습니다. 이에 대해서 우리는 편리한 search-lucene.com 서비스를 사용해서 두 검색엔진의 메일링리스트로 보내진 이메일 메시지 수를 비교해봤습니다. 우선 그것들이 어떻게 다른지 보도록 하겠습니다.

Apache Solr

As you can see, Solr user activity varies slightly from month to month which is perfectly understandable. Each bar on the chart represents two weeks. We can see the number of messages ranges from about 390 mails to about 770 per two weeks, which gives us between 800 to 1600 mails per month is we do a bit of rounding up. Quite impressive I must say!

보시다시피, Solr 의 유저활동은 매월 확실하게 다르다는 것을 알 수 있습니다. 차트 위의 각 막대는 2주간을 나타냅니다. 메시지 수의 범위는 2주일마다 약 390통에서 770통이며, 매월 800에서 1600통이 된다고 합시다. 매우 인상적이라고 할 수 있겠죠.

ElasticSearch

ElasticSearch User Mailing List

Now let’s discuss the ElasticSearch side. First a few words of explanation. If you look at the above chart you might think that ElasticSearch mailing list was silent and then users started posting on October 2012. That’s clearly not true – it is just that we didn’t add ElasticSearch to search-lucene.com until recently. However, you may see that the number of messages during the same period of time is quite similar – both Solr and ElasticSearch saw about 670 – 730 messages during a two weeks period. This gives us 2 emails per hour on average.

ElasitcSearch 를 볼까요? 먼저 잠깐 설명을 하자면, 위의 차트를 보면 ElasticSearch 의 메일링리스트는 조용하고 유저는 2012년 10월에 투고를 시작했구나 하고 생각할 수 있겠지만, 그렇지 않습니다. 이건 단지 우리들이 ElasticSearch 에 대해 최근까지 search-lucene.com 에 추가하지 않았기 때문이죠. 하지만 같은 기간의 메시지 수는 매우 비슷하다는 것을 알 수 있습니다. Solr 와 ElasticSearch 는 2주간 약 670에서 730통 정도로 볼 수 있습니다. 이것은 평균 1시간당 2통 정도가 되겠네요.

개별 유저의 수

Email volume is one thing, but I was always curious about how many different people write emails on the mailing lists. Having such number would give us an additional understanding of the structure of the community around a particular search engine, new users, etc. However, we should not look only at this number, but also on things like most active people on the mailing lists. In both cases we’ve looked at the same period from 1 to 30 December 2012. We’ve used the data we index for search-lucene.com to calculate these numbers.

이메일 수는 하나의 지표입니다. 하지만 우리는 항상 어느정도의 다른 사람이 메일링리스트에 써왔는지를 주목했습니다. 그런 숫자를 통해서 검색 엔진이나 새로운 유저에 관한 커뮤니티의 구조에 대해서 보다 깊은 이해를 얻을 수 있습니다. 하지만 이 숫자뿐만 아니라, 가장 활발한 메일링 리스트 상의 사람들도 참고해보도록 합니다. 양쪽의 케이스에 대해서 2012년 12월 1일부터 30일까지의 같은 기간을 두고 조사했습니다. 사용한 데이터는 우리가 search-lucene.com 에서 이 숫자들을 계산하기 위해 인덱싱한 것입니다.

Apache Solr

In case of Apache Solr there were 234 unique users sending mail to the users mailing list. Almost 8 unique users per day on average, nice :)

Apache Solr 에서는 234명의 사람들이 유저 메일링리스트에 메일을 보내왔습니다. 하루 평균 약 8명의 유저입니다. 굉장하네요.

ElasticSearch

In case of ElasticSearch there were 271 unique users sending mail to the users mailing list. This gives us about 9 unique users per day on average which is even nicer.

ElasticSearch 는 271명의 유저가 메일링리스트에 메일을 보냈습니다. 하루 평균 약 9명인데, 더 굉장하네요.

사용가능한 리소스

As far as resources available, both ElasticSearch and Solr have great documentation. On Solr wiki site (http://wiki.apache.org/solr/) you can find information about most of the components and of course the tutorial for beginners. ElasticSearch is very similar, with tutorial and very good description of functionality available at http://www.elasticsearch.org/. In addition to that, there are three books published about Apache Solr (in English) and more (e.g. my Apache Solr 4 Cookbook) coming soon. As of now, there are no published books about ElasticSearch, but…. stay tuned :)

사용가능한 리소스에 관해서, ElasticSearch 와 Solr 양쪽 모두 매우 좋은 문서를 가지고 있습니다. Solr Wiki Site(http://wiki.apache.org/solr/) 에서는 대부분의 컴포넌트는 물론 초보자용 튜토리얼에 관한 정보도 얻을 수 있습니다. ElasticSearch 도 마찬가지로 튜토리얼과 매우 좋은 기능설명이 http://www.elasticsearch.org/ 에 있습니다. 그리고 Apache Solr 에는 (영어로 된) 책이 3권 출판되었고, 거기에 (예를들어 저의 Apache Solr 4 Cookbook)이 곧 나올 예정입니다. 지금까지는 ElasticSearch 에 관한 책은 아직 출판되지 않았지만… 좀 더 기다려 볼까요 :)

검색 상의 트렌드

We also decided to use uncle Google to look at trends about Apache Solr and ElasticSearch. Let’s look at the following diagram:

Google 을 써서 Apache Solr 와 ElasticSearch 의 트렌드를 알아봅시다. 다음 그림을 봐주세요.

As you can see, until early 2010 there was no interest in ElasticSearch at all, at least looking from the point of view of users searching about it. Note that we published the interview with Shay Banon over two and a half years ago – back in May 2010 – before ElasticSearch registered on Google’s search trends radar! SolrCloud didn’t exist back then, so people slowly started looking for information on SolrCloud later in 2010. The volume of searches mentioning SolrCloud is very small even today – perhaps because people tend to search for Solr and not SolrCloud. And while SolrCloud is still a new kid around the block, searches for Solr dwarf searches for ElasticSearch despite the buzz surrounding ElasticSearch.

보시다시피 2012년초까지는 ElasticSearch 에 대한 흥미는 전혀 없었습니다. 적어도 그것을 검색한 사람의 수로 보면 그렇죠. 참고해둘 것은 우리가 Shay Banon 과 인터뷰를 공개한 것은 2년반 전이라는 것입니다. 즉 2010년 5월. ElasticSearch 가 Google 검색트렌드의 트레이드레이더에 등록되기도 전이죠. SolrCloud 는 그 당시에 존재하지도 않았습니다. 따라서 사람들은 느긋하게 SolrCloud 에 관한 정보를 2010년 말에 찾기 시작했습니다. 검색량이 나타내는 것은 SolrCloud 는 지금도 매우 적습니다. 아마 많은 사람들이 Solr 로 검색하지, SolrCloud 로는 하지 않기 때문이겠죠. 그리고 SolrCloud 가 지금도 갓 나온 것과 달리, Solr 에 대한 검색은 ElasticSearch 에 대한 검색보다 조금 더 눈에 뜁니다. ElasticSearch 에 대한 Buzz 에도 불구하고 말이죠.

Of course, the above doesn’t say anything about the number of users of both search engines, but it definitely shows some information about the interest in these technologies.

물론 위에서 두 검색엔진의 유저 수에 대해서는 말할 것도 없습니다. 하지만 두 기술에 대한 흥미를 나타내는 어떤 정보라는 것은 결정적이죠.

개발자와 코드

If you are familiar with ElasticSearch and Solr you’ll probably know that ElasticSearch is much younger than Apache Solr. Apache Solr was created by Yonik Seeley in 2004 and donated to Apache Software Foundation. On the other hand, the first version of ElasticSearch was released by Shay Banon in 2010. This is quite important to say before we can talk about differences about contributors and the code itself. But getting to the point – we thought that it may be interesting to see both Apache Solr and ElasticSearch look from the Bird’s Eye perspective. To do that we’ve used the statistics and charts from ohloh.net. So, let’s see what they look like.

만약 ElasticSearch 와 Solr 에 익숙하다면 아마도 ElasticSearch 쪽이 Apache Solr 와 비교해서 조금 더 어리다는 것을 알고 있을 겁니다. Apache Solr 와 Yonik Seeley 에 의해 2004년에 만들어져 Apache 소프트웨어 재단에 기증되었습니다. 반면 ElasticSearch 의 최초 버젼은 2010년 Shay Banon 에 의해서 릴리즈 되었죠. 이것은 공헌자와 코드 자체의 차이에 대해서 의논하기 전에 매우 중요한 것입니다. 하지만 그전에 Apache Solr 와 ElasticSearch 를 관망적으로 보는 것도 좋다고 생각합니다. 그러기 위해서 ohloh.net 에서 통계와 그래프를 사용했습니다. 그럼 한번 볼까요?

Apache Solr

코드 통계

If we look at the current statistics, at the beginning of January 2013 Solr had more than 212k lines of code, with almost 7000 commits and 38 contributors. However, keep in mind that contributors are people that committed the code, not necessarily the ones that actually implemented it and provided the patch, so the actual number of contributors is much higher. The chart looks like this: !

현재의 통계를 보면 2012년 1월 처음으로 Solr 는 212k줄 이상의 코드를 가지고, 약 7000커밋과 38명의 컨트리뷰터가 있습니다. 하지만 주의할 것은 컨트리뷰터라는 것은 코드를 커밋한 사람들을 나타내며, 실제로 구현해서 패치를 제공한 사람들이 아니라는 것입니다. 그 때문에 실제 컨트리뷰터 수는 보다 많을 것이라고 장담합니다. 아래의 그래프를 볼까요.

lines_of_code_solr

최고의 컨트리뷰터

If we look at top contributors we see Mark Miller on top, followed by Yonik Seeley and Robert Muir in the third place :)

최고의 컨트리뷰터는 Mark Miller 입니다. 이어서 Yonik Seeley, 그리고 Robert Muir 가 3위를 잇고 있습니다. :) Active Contributors

활동적인 컨트리뷰터

One more interesting thing is the number of contributors that were actively involved during a given period of time. Looking at Apache Solr since 2006 we can see the following: active_commiters_solr I think that we can say that we had a stable growth of active contributors starting from 2006 until June 2012 with a bit of downfall shortly after that. However I don’t think that the number active contributors will be dropping, it’s more likely due to a bit of exhaustion of releasing Apache Lucene and Solr 4.0 :)

또 하나 재미있는 것은 어떤 기간 사이에 실제로 관계한 컨트리뷰터 수입니다. Apache Solr 에 관해서는 2006년부도 보면 다음과 가튼 사실을 알 수 있습니다: 제가 생각하기에 2006년부터 2012년 6월까지 안정적이고 활동적인 컨트리뷰터의 증가가 있었다고 장담합니다. 그 뒤에 다소 하락폭이지만요. 하지만 활동적인 컨트리뷰터의 수가 이후 줄어들고 있다고 생각하지 않습니다. 오히려 Apache Lucene 과 Solr 의 4.0 릴리즈로 맥이 빠져 있겠죠. :)

ElasticSearch

코드 통계

Current code statistics for ElasticSeach shows that the code base just hit the 240k LOC with about 4.2k commits and 87 contributors. lines_of_code_es

현재 ElasticSearch의 코드 통계는 코드베이스가 240k줄이고 4.2k 의 커밋과 87명의 컨트리뷰터입니다.

최고의 공헌자

As we’d expect, Shay Banon is the top contributor to ElasticSearch. In the second place on the podium we have Martijn van Groningen and Igor Motov in the third place:

기대한대로 Shay Banon 이 ElasticSearch의 최고 공헌자입니다. 2위는 Martijn van Groningen 이고 3위는 Igor Motov 입니다.

활동적인 컨트리뷰터

And finally the active contributors. We don’t have the same time frame comparing to Apache Solr, which is understandable as ElasticSearch is younger, but still we can see what is happening. active_commiters_es As you can see from the first quarter of 2011 there was a number of active contributors varying from 5 to about 10 with the top at the same time as in Solr – 12 active contributors in June 2012.

마지막으로 활동적인 컨트리뷰터입니다. Apache Solr 와는 달리 같은 수준의 타임프레임으로는 비교할 수 없습니다. ElasticSearch 가 훨씬 어리기 때문이죠. 하지만 그래도 무슨 일이 있는 지는 봐야되겠죠.

2011 년 1/4분기에서 볼 수 있듯, 활동적인 컨트리뷰터의 수는 5에서 10명까지 바뀌고 있고, Solr 와 같은 시기인 7월에 최대 12명을 기록하고 있습니다.

정리

As everything in this post indicates, both projects’ development and user communities are strong, active, and about equal. 2013 will be an interesting year for both projects.

이 글이 나타내는 대로, 두 프로젝트의 개발자와 유저 커뮤니티는 강력하고 활동적이며 그리고 대체로 비슷합니다. 2013년은 두 검색엔진에 있어서 재미있는 한 해가 되겠죠.

We are nearing the end of our SolrCloud vs. ElasticSearch series. What else would you like us to cover? Please use the comments to let us know!

SolrCloud vs ElasticSearch 시리즈도 이제 끝입니다. 혹시 뭔가 다른 것을 다뤘으면 하는 게 있나요? 커멘트로 알려주세요.

@kucrafal, @sematext

[번역] Solr vs ElasticSearch: Part 5 – Management API Capabilities

Jul 17th, 2013

Solr vs ElasticSearch: Part 5 – Management API Capabilities

January 8, 2013 by Rafał Kuć

In previous posts, all listed below, we’ve discussed general architecture, full text search capabilities and facet aggregations possibilities. However, till now we have not discussed any of the administration and management options and things you can do on a live cluster without any restart. So let’s get into it and see what Apache Solr and ElasticSearch have to offer.

이전 편에서는 전반적인 개요, 전문검색기능, Faceting 의 집약기능에 대해서 다루었습니다. 지금까지 관리와 운영 옵션과 운용중의 클러스터에서 재기동없이 할 수 있는 일에 대해서는 다루지 않았죠. 자, 그럼 이에 대해서 Apache Solr 와 ElasticSearch 는 무엇을 제공할 수 있는 지 보도록 할까요?

Input/Output Format

ElasticSearch

As you probably know ElasticSearch offers a single way to talk to it – its HTTP REST API – JSON structured queries and responses. In most cases, especially during query time, it is very handy, because it let’s you perfectly control the structure of your queries and thus control the logic.

알고 계시겠지만 ElasticSearch 는 하나의 통신방법을 제공합니다. 그것이 바로 HTTP REST API이며 이는 JSON 구조로 쿼리와 응답을 만들어냅니다. 대부분의 경우, 특히 쿼리시에 매우 편리합니다. 쿼리 구조를 완전하게 컨트롤할 수 있고, 그러므로 로직을 컨트롤할 수 있기 때문입니다.

Apache Solr

On the other hand we have Apache Solr. If you are familiar with it you know that in order to send a query to Solr one needs to send it using URL request parameters. This makes communication much less structured compared to ElasticSearch JSON format. In response you can get multiple response formats that are supported out of the box, like the default XML, JSON, CSV, PHP serialized, or Ruby.

반면 Apache Solr 는 쿼리를 보낼 때에 URL 리퀘스트 파라메터를 보낼 필요가 있다는 것은 알고 계실겁니다. 이것이 ElasticSearch 의 JSON 형식과 비교하면 커뮤니케이션을 보다 낮은 수준으로 구조화해버리는 것이죠. 하지만 응답시에는 여러 응답형식을 지원하고 있습니다. 기본적으로 XML, JSON, CSV, 직렬화된 PHP, Ruby 가 그것들이죠.

Statistics API

Most of the time your search cluster will be fine and you won’t have any problems with it. However, there are times where you may need to see what is happening inside Apache Solr or ElasticSearch to diagnose problems, such as performance problems (hello SPM!), stability issues, or anything like that. In such cases, both search engines provide some amount of statistics.

대부분의 경우, 검색 클러스터는 튼튼하고 문제를 일으키는 일은 없을 것 입니다. 그러나 Apache Solr 나 ElasticSearch 에서 무슨 일이 일어났는가를 보고 문제 진단을 해 볼 필요가 있습니다. 예를들어서 퍼포먼스 문제나(어이, SPM!) 안전성의 문제 와 같은 것들이죠. 그런 경우에는 두 검색엔진은 몇가지의 통계값을 제공합니다.

Apache Solr

In Solr we can use JMX or HTTP requests to retrieve information about handler usage, cache statistics or information about most Solr components.

Solr 에서는 JMX 나 HTTP 질의를 핸들러로 캐쉬의 통계, 그리고 대부분의 Solr 컴포넌트의 정보를 얻을 때에 이용할 수 있습니다.

ElasticSearch

ElasticSearch was designed to be able to return various statistics about itself. With the REST API calls we can get information from the simplest ones like cluster health or nodes statistic, to extent information like the detailed ones about indices with merges, refreshes. The same stats are available via JMX, too.

ElasticSearch는 자체적으로 다양한 통계를 반환할 수 있도록 설계되었습니다. REST API 의 호출을 사용해서 가장 단순한 것, 예를 들자면 클러스터의 건강상태나 노드의 통계에서 인덱스의 머지, 리프레쉬와 같은 상세한 것 등의 광범위한 정보를 얻을 수 있습니다. 이와 같은 정보는 JMX 경유로 얻을 수 있습니다.

Settings API

ElasticSearch

ElasticSearch allows us to modify most of the configuration values dynamically. For example, you can clear you caches (or just specific type of cache), you can move shards and replicas to specific nodes in your cluster. In addition to that you are also allowed to update mappings (to some extent), define warming queries (since version 0.20), etc. You can even shut down a single node or a whole cluster with the use of a single HTTP call. Of course, this is just an example and doesn’t cover all the possibilities exposed by ElasticSearch.

ElasticSearch는 동적으로 설정값의 대부분을 변경할 수 있습니다. 예를 들어 캐쉬를 클리어했다든가, 특정 타입의 캐쉬만을 클리어했다거나, Shard 와 레플리카를 클러스터로 지정한 노드로 이동할 수 있습니다. 그리고 어떤 범위의 맵핑을 변경하거나 워밍쿼리(v0.20이후)를 정의할 수도 있습니다. 단일 노드나 클러스터 전체를 한번의 HTTP 호출로 셧다운할 수도 있죠. 물론 이것들은 단순히 일부분 일 뿐이며, ElasticSearch 에서 공개된 기능 전체를 커버하는 것은 아닙니다.

Apache Solr

In case of Apache Solr we do not (yet) have the possibility of changing configuration values (like warming queries) with API calls.

Apache Solr 의 경우, API 호출로 설정값의 변경 (예를들어 워밍쿼리)할 수 있는 기능은 “아직” 없습니다.

인덱스/콜렉션의 관리기능

In addition to the capabilities mentioned above both ElasticSearch and Apache Solr provide APIs that allows us to modify our deployment when it comes to collections and indices.

위에서 설명한 기능 이외에 ElasticSearch 와 Apache Solr 는 콜렉션과 인덱스에 관한 디플로이를 변경할 수 있습니다.

Apache Solr

Pre 4.0 we were able to manipulate cores inside our Solr instances. We could create new cores, reload them, get their status, rename, swap two of them, and finally remove a core from the instance. With Solr 4.0, a new API was introduced that is built on top of core admin API – the collections API. It allows us to create collections on started SolrCloud cluster, reload them and of course delete them. As the collections API is built on top of the core admin API, if you create a new collection all the needed cores on all instances will be created. Of course, the same goes for reloading and deleting – all the cores will be appropriately informed and processed.

4.0 이전에는 Solr 인스턴ㅅ 안에 여러 코어를 만질 수 있었습니다. 새로운 코어를 만들거나 리로드 하거나, 상태정보를 얻거나, 이름을 바꾸거나, 두가지를 스와핑하거나, 마지막으로 인스턴스에서 코어를 삭제할 수 가 있었습니다. Solr 4.0 에서는 코어관리 API 의 맨 위에 새로운 API 가 새로이 소개되었습니다. 그것이 바로 Collections API 입니다. 이미 시작한 SolrCloud 클러스터 위에 콜렉션을 만들거나 리로드, 삭제도 할 수 있습니다. 콜렉션 API 가 코어 관리 API 위에서 구축되어 새로운 콜렉션을 만들면 모든 인스턴스 상의 필요한 코어가 모두 작성됩니다. 마찬가지로 리로드나 삭제도 가능합니다. 모든 코어는 적절하게 정보를 받아 처리됩니다.

ElasticSearch

In case of ElasticSearch we can create and delete indices by running a simple HTTP command (GET or DELETE method) with the index name we are interested in. In addition to that, with a simple API call we can increase and decrease the number of replicas without the need of shutting down nodes or creating new nodes. With the newer ElasticSearch versions we can even manipulate shard placement with the cluster reroute API. With the use of that API we can move shards between nodes, we can cancel shard allocation process and we can also force shard allocation – everything on a live cluster.

ElasticSearch 의 경우, 간단한 HTTP 커맨드(GET 또는 DELETE 메소드)를 대상으로 한 인덱스의 이름과 함께 실행하는 것만으로 인덱싱과 삭제가 가능합니다. 그리고 간단한 API 호출로 레플리카의 수를 노드의 셧다운이나 추가없이 증감할 수도 있습니다. 새로운 ElasticSearch 버젼에서는 Shard 의 배치를 클러스터의 리로드 API 로 할 수 있습니다. 이를 통해서 Shard 를 노드끼리 이동한다거나, Shard 를 할당절차를 취소하거나 강제할 수도 있습니다. 모든 것은 운용중의 클러스터 위에서 가능합니다.

Query Analysis

Apache Solr

If you’ve used Apache Solr you probably come across the debugQuery parameter and the explainOther parameter. Those two allows to see the detailed score calculation for the given query and documents found in the results (the debugQuery parameter) and the specified ones (the explainOther). In addition, we can also see how the analysis process is done with the use of analysis handler or by using the analysis page of the Solr administration panel provided with Solr.

Apache Solr 를 이전에 사용한 적이 있다면 debugQuery 파라메터와 explainOther 파라메터를 알고 있으실 겁니다. 이 두가지를 사용해서 주어진 쿼리와 그 결과를 발견한 도큐먼트(debugQuery파라메터), 또는 특정 하나의(plainOther)의 스코어 계산에 대한 자세한 내용을 볼 수도 있습니다. 거기에 해석핸들러를 사용하거나 Solr 에서 제공되는 Solr 관리패널의 해석페이지로 해석프로세스가 어떻게 이뤄지는 지도 확인할 수 있습니다.

For example this is how debug information returned by Solr can look like:

Solr로 반환되는 디버그 정보가 어떻게 보여지는 가는 아래를 참고해주세요.

<?xml version="1.0" encoding="UTF-8"?>
<response>
.
.
.
<lst name="debug">
 <str name="rawquerystring">ten</str>
 <str name="querystring">ten</str>
 <str name="parsedquery">(+DisjunctionMaxQuery((prefixTok:ten)~0.01) ())/no_coord</str>
 <str name="parsedquery_toString">+(prefixTok:ten)~0.01 ()</str>
 <str name="QParser">DisMaxQParser</str>
 <null name="altquerystring"/>
 <null name="boostfuncs"/>
 <lst name="timing">
  <double name="time">2.0</double>
  <lst name="prepare">
   <double name="time">1.0</double>
   <lst name="org.apache.solr.handler.component.QueryComponent">
    <double name="time">1.0</double>
   </lst>
   <lst name="org.apache.solr.handler.component.FacetComponent">
    <double name="time">0.0</double>
   </lst>
   <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
    <double name="time">0.0</double>
   </lst>
   <lst name="org.apache.solr.handler.component.HighlightComponent">
    <double name="time">0.0</double>
   </lst>
   <lst name="org.apache.solr.handler.component.StatsComponent">
    <double name="time">0.0</double>
   </lst>
   <lst name="org.apache.solr.handler.component.DebugComponent">
    <double name="time">0.0</double>
   </lst>
 </lst>
 <lst name="process">
  <double name="time">1.0</double>
  <lst name="org.apache.solr.handler.component.QueryComponent">
   <double name="time">0.0</double>
  </lst>
  <lst name="org.apache.solr.handler.component.FacetComponent">
   <double name="time">0.0</double>
  </lst>
  <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
   <double name="time">0.0</double>
  </lst>
  <lst name="org.apache.solr.handler.component.HighlightComponent">
   <double name="time">0.0</double>
  </lst>
  <lst name="org.apache.solr.handler.component.StatsComponent">
   <double name="time">0.0</double>
  </lst>
  <lst name="org.apache.solr.handler.component.DebugComponent">
   <double name="time">1.0</double>
  </lst>
 </lst>
</lst>
<lst name="explain">
 <str name="Ten mices">
1.3527006 = (MATCH) sum of:
 1.3527006 = (MATCH) weight(prefixTok:ten in 35158) [DefaultSimilarity], result of:
 1.3527006 = fieldWeight in 35158, product of:
 1.4142135 = tf(freq=2.0), with freq of:
 2.0 = termFreq=2.0
 6.1216245 = idf(docFreq=6355, maxDocs=1065313)
 0.15625 = fieldNorm(doc=35158)
 </str>
</lst>
</lst>
</response>

As you can see, we can get information about timings of each of the used components. In addition to that, we see the parsed query and of course the explain information showing us how the document score was calculated.

보시는 대로, 사용하고 있는 각 컴포넌트의 타이밍에 관한 정보를 얻을 수 있습니다. 그리고 파싱된 쿼리를 보거나, 어떻게 도큐먼트의 스코어가 계산되는 가를 알 수 있는 explain 정보도 확인가능합니다.

ElasticSearch

ElasticSearch exposes three separate REST end-points to analyze our queries, documents and explain the documents score. The Analyze API allows us to test our analyzer on a specified text in order to see how it is processed and is similar to the analysis page functionality of Solr. The Explain API provides us with information about the score calculation for a given documents. Finally, the Validate API can validate our query to see is it is proper and how expensive it can be.

ElasticSearch는 3개로 나뉘어진 REST 엔드포인트를 공개하여, 쿼리나 도큐먼트, 도큐먼트 스코어의 explain 을 해석할 수 있습니다. 해석API 는 해석기가 특정 텍스트에서 어떻게 처리되는지를 테스트할 수 있습니다. SOlr 의 해석페이지 기능과 비슷합니다. Explain API 는 주어진 도큐먼트에 대해서 스코어 연산에 관한 정보를 제공합니다. 마지막으로 Validate API 는 쿼리가 적절하게 어느 정도의 비용인지를 확인할 수 있습니다.

For example, this is what Explain API response looks like:

Explain API 의 응답은 아래와 같습니다.

{
 "ok" : true,
 "_index" : "docs",
 "_type" : "doc",
 "_id" : "1",
 "matched" : true,
 "explanation" : {
   "value" : 0.15342641,
   "description" : "fieldWeight(_all:document in 0), product of:",
   "details" : [ {
     "value" : 1.0,
     "description" : "tf(termFreq(_all:document)=1)"
   }, {
     "value" : 0.30685282,
     "description" : "idf(docFreq=1, maxDocs=1)"
   }, {
     "value" : 0.5,
     "description" : "fieldNorm(field=_all, doc=0)"
   } ]
 }
}

You can see the description about score calculation that is returned from the Explain API.

Explain API로 반환되는 스코어 계산에 대한 내용을 알 수 있겠죠?

Before We End

There are a few words more we wanted to write before summarizing this comparison. First of all the above mentioned APIs and possibilities are not all that it is available, especially when it comes to ElasticSearch. For example, with ElasticSearch you can clear caches on the index level, you can check index and types existence, you can retrieve and manage your warming queries, clear the transaction log by running the Flush API, or even close an index or open those that were closed. We wanted to point some differences and similarities between Apache Solr and ElasticSearch, but we didn’t want to make a summary of the documentation. :) So, if you are interested in some functionality and you don’t know if it exists, just send a mail to Apache Solr or ElasticSearch mailing list or leave a comment here, and we will be glad to help.

이 비교를 정리하기 전에 좀 더 써두고 싶은 것이 있습니다. 우선 맨처음 위에서 설명한 API 와 기능은 두 검색엔진이 가진 모든 기능이 아닙니다. 특히 ElasticSearch가 그렇습니다. 예를들어 ElasticSearch 에서는 인덱스 레벨 위에서 캐쉬를 지우거나 존재하는 인덱스와 타입을 확인하거나, 워밍쿼리를 뽑고, 관리하거나, 플러쉬 API 를 시행해서 트랜잭션로그를 삭제하거나, 거기에 인덱스를 닥거나 닫힌 인덱스를 여는 것도 가능합니다. 우리는 Apache Solr 와 ElasticSearch 의 차이와 유사점에 대해서 다루고 싶었지, 정리한 문서를 만들고 싶지는 않았습니다. :) 따라서 만약 어떤 기능에 흥미가 있고, 그런 기능이 있는 지 모르는 경우는 Apache Solr 나 ElasticSearch 의 메일링리스트에 메일을 보내거나 여기에 코멘트를 남겨주세요. 기꺼이 도와드리겠습니다.

정리

When we first started the Solr vs ElasticSearch series we planned to initially divide the series into five posts, which are now published. However after seeing the popularity of the series and the amount of feedback we’ve received, we decided to extend the series. You can soon expect the next part, which will be dedicated to non-technical, but deeply important and interesting aspects of both search servers. After that, we’ll get back to the technical details with the subsequent post dedicated to score influence capabilities, describing how we can change the default Lucene scoring and influence it from configuration, during indexing time and finally during querying.

Solr vs ElasticSearch 시리즈를 시작했을 때, 우리는 시리즈를 다섯개의 기사로 나눌 예정이었습니다. 하지만 이 시리즈의 인기와 여러 피드백의 양을 생각한 결과 이 시리즈를 확장하기로 했습니다. 이미 다음편을 기대하고 있겠죠. 다음편은 비기술적이지만 매우 중요하고 재미있는 두 가지 검색서버의 측면에 대해서 다루고자 합니다. 그리고 기술적인 자세한 이야기에 이은 포스트에서는 반환되는 스코어에 영향을 주는 기능에 대해서 다루고, 인덱싱할 때와 최종적으로 쿼리 사이에 어떻게 Lucene 의 기본 스코어링을 바꿔서 설정에서 영향을 줄 수 있는 가를 다루고자 합니다.

If you liked this post, please tweet it!

이 글이 좋아한다면 트윗해주세요.

@kucrafal, @sematext

← Older Blog Archives